Re: [Last-Call] [v6ops] Iotdir last call review of draft-ietf-v6ops-nd-cache-init-05

"Pascal Thubert \(pthubert\)" <pthubert=40cisco.com@xxxxxxxxxxxxxx> · Tue, 15 Sep 2020 10:52:52 +0000

Hello Jen

Many thanks for your detailed answer. Let’s see below :

Note that I added the IESG in cc since the merge was on their request.

> Le 15 sept. 2020 
My birthday 🎂😏

> Thanks a lot for your review and comments!
> The IESG asked me to merge this draft and 6man-grand so only the
> latter will be published.
> However as most of the text from nd-cache-init has been moved to
> 6man-grand, your comments still apply - sorry, I missed your email and
> did not address them in 6man-grand-02. The changes mentioned below
> will appear in -03

Better this way if you ask me on both a accounts. Merging looks good. And then we can isolate the review changes.

> 
>> On Fri, Sep 11, 2020 at 12:03 AM Pascal Thubert (pthubert)
>> <pthubert=40cisco.com@xxxxxxxxxxxxxx> wrote:
>> ===========================================================================================================
>> Major
>> ===========================================================================================================
>> 
>> 
>> Section 3 lists a number of approaches, but that list does not match the sections 3.x coming next.
>> In particular there is no section that explains why we are not " Making the probing logic on hosts more robust."
>> It seems that if the host sends just one probe to start with, the problem goes away. There must be a reason why this is not done today.
> 
> I've added the following text:
> "
> 
> 8.7. Making the Probing Logic on Hosts More Robust
> 
> Theoretically the probing logic on hosts might be modified to deal
> better with initial packet loss. For example, only one probe can be
> sent or probes retransmit intervals can be reduced. However,
> 
> - This approach does not fix the root cause but just provides a
> work-around for one particular case of probing traffic. Packets are
> still being lost.

If no one knows your address but the guy who replies I’m not sure of this. Maybe this item could be merged with your last point?

> - It's rather unlikely that all affected systems could be updated in
> any reasonable timeframe.

Not sure if I get you there. Isn’t it the same for getting this spec implemented ? If so maybe we can omit this argument. Or did you mean something else?

> - It would not solve the problem if there are multiple applications on
> the same host sending traffic and return packets arrive
> simultaneously.

True. It would have to be done in the OS when forming the address and before any application can open a socket. 

Note that some phones send a crafted packet to detect, e.g., a hotel portal. Is that really different ?

> - Even if a host sends a single probe, the response might consist of
> multiple packets and therefore might be still affected by the problem
> described in this document.

I guess it takes a special crafting to make sure we get only one packet in response, e.g. a TCP SYN. But then that locks ressources on the other end. 

A variation of a ping over udp looks more suited. Or forming a security association with the DNS server? Hard to live with no DNS anyway.

> 
> 8.8. Increasing the Buffer Size on Routers
> 
> Increasing the buffer size and buffering more packets would exacerbate
> issues described in [RFC6583] and make the router more vulnerable to
> ND-based denial of service attacks."
> 

Considering the vast address space that can be attacked there is no amount of memory that will fully protect against a sweeping DOS attack.

The memory in a router is not constrained as it was 20 + years ago. We can allocate many times what we could at the time of the writing of classic ND. The attack can also be many times faster but then that makes the anomaly more recognizable and the router can raise defenses.

I suppose that a platform that is worth attacking can throttle incoming requests.

That is not a perfect response to the sweeping DOS attack though. You can only defeat it if the network has a full knowledge of all the addresses On Link and, e.g, the router plainly drops any cache miss in hardware. Sadly this draft doesn’t give us that, that’s the other point later below.

Also hardware assistance for the forwarding was just emerging and the related issues were disregarded. Implementing a reactive protocol with hardware assist is a great penalty to the router. The draft helps a lot as the situation can be mostly avoided. 

But the complicated code path cannot be removed for the same reason as above, we do not have a complete list. Less exercising may mean deteriorating, least exercised code paths being a familiar location for a bug nest.

All in all, this argument pleads for the full proactive solution more than the partial one (voluntarily avoiding quick and dirty because it is not the latter).

> Would it address your comment?
> 

Not yet as you see. There are pros and cons.

Sending a single probe provides a local solution with no dependency on the router.

I believe that the NA is a good thing, better than current state of affairs.

Ideally the draft would describe both and provide recommendations on how to do the single packet as a step 0. Then it would do what it does today as step 1. Then it would open in conclusion to a future with a full proactive solution where the DOS attack is not possible any more.

>> ---------------------------------------------------------------------------------------------------------------------------------------------
>> "
>> Implementing such functionality is much more complicated than all
>>      other solutions as it would involve complex data-control planes
>>      interaction."
>> 
>> As it goes, reactive ND as it stands involve complex data-control planes interactions, the hardware needs to interrupt its process and tell the software in case of a cache miss.
>> This process is not only complicated but subject to DoS attacks and all prone to bugs. The solution eliminates that activity for a new address and that is a major plus for the router. Sadly it does not fix the problem permanently as the cache may be flushed. I believe it is important to mention both early in the draft to better position its value (great) and limits (the Neighbor cache is still a cache so the problem is not eliminated).
> 
> I've added a "Solution Limitations" section:
> 
> Solution Limitations
> 
> The solution described in this document provides some improvement for
> a node configuring a new IPv6 address and start sending traffic from
> it. However that approach does not completely eliminate the scenario
> when a router receives some transit traffic for an address without the
> corresponding Neighbor Cache entry. For example:
> 
> -- If the host starts using an already configured IPv6 address after a
> long period of inactivity, the router might not have the NC entry for
> that address anymore, as old/expired entries are deleted.
> - Flashing the router Neighbor Cache would trigger the packet loss for
> all actively used addresses removed from the cache.
> 
> 

Flushing ?

>> ===========================================================================================================
>> Minor
>> ==========================================================================================================="
>>   1.  A host joins the network and receives a Router Advertisement (RA)
>>       packet from the first-hop router (either a periodic unsolicited
>>       RA or a response to a Router Solicitation sent by the host).
>> "
>> Maybe clarify that this is a multicast RA sent to all hosts
> 
> Not necessary. Solicited RAs can be (should be) unicast.
> 

Sure, but then, using another address like link local 

>> "
>> The
>>       RA contains information the host needs to perform Stateless
>>       Address Autoconfiguration ([RFC4862]) and to configure its
>>       network stack.
>> "
>> You could say "SLAAC and/or DHCPv6" for completeness.
> 
> Does RA contain information the host needs to perform DHCPv6? I'm not so sure..
> 

The M and/or O bits ... DHCP goes a very long way to configure the stack.

> 
>> "
>>                             As in most cases the RA also contains the link-
>>       layer address of the router, the host can populate its Neighbor
>>       Cache with the router's link-local and link-layer addresses.
>> "
>> Maybe also clarify in before that sentence that the source IPv6 address of the RA is a link local address of the router (section 4.2 of RFC 4861)
> 
> Done:
> "
> 
> The RA contains information the host needs to perform SLAAC and to
> configure its network stack. The RA is send from the router's
> link-local address and in most cases also contains the link-layer
> address of the router. As a result the host can populate its Neighbor
> Cache with the router's link-local and link-layer addresses.
> "
> 

-> is sent. Maybe avoid « in most cases » and uses « may » instead ?

>> "
>>                                                                          Most router
>>       implementations buffer only one data packet while
>> "
>> Is that something you know for sure? Else, you may indicate instead that the standard only requires the router to hold one data packet.
>> For memory, RFC 4861 section 7.2.2.  "Sending Neighbor Solicitations" says:
>> "
>> ...
>>   While waiting for address resolution to complete, the sender MUST,
>>   for each neighbor, retain a small queue of packets waiting for
>>   address resolution to complete.  The queue MUST hold at least one
>>   packet, and MAY contain more.  However, the number of queued packets
>>   per neighbor SHOULD be limited to some small value.  When a queue
>>   overflows, the new arrival SHOULD replace the oldest entry.  Once
>>   address resolution completes, the node transmits any queued packets.
>> ...
>> "
> 
> Changed to:
> "As per Section 7.2.2 of [RFC4861] Routers MUST buffer at least one
> data packet and MAY buffer more, while resolving the packet
> destination address. However most router implementations limit the
> buffer size to a few packets only, so all subsequent packets for the
> host global address are dropped, until the address resolution process
> is completed."

Not untrue. Does that mean true?

> 
>> ---------------------------------------------------------------------------------------------------------------------------------------------
>> 
>> "connects to the network for the first time or after a timeout long"
>> 
>> Maybe "inactivity time" is more suitable than "timeout"
> 
> Done.
> 
>> " This option
>>   requires some investigation and discussions and seems to be excessive
>>   for the problem described in this document. "
>> 
>> The option itself is not "excessive", it is a technical solution. Maybe you could clarify what is excessive, e.g., the complexity to migrate, to implement and deploy, or the time till a solution is available commercially on all devices.
> 
> Changed to "This option requires some investigation and discussion.
> However the implementation complexity and unclear adoption timeline
> makes this approach less preferable than one proposed in this
> document."
> 

I’m good with that 

>> ===========================================================================================================
>> Nits
>> ===========================================================================================================
>> 
>> "if a host A has an neighbor": an -> a
>> "same sequence of events happen": happen -> happens
> 
> This text did not make it to 6man-grand anyway.
> 
>> Voila!
> 
> Merci! ;)
> 

De même (me too ;)

Pascal 

> --
> SY, Jen Linkova aka Furry
-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call