Level of standardization of the Echo mode of BFD [Re: Tsvart last call review of draft-ietf-bfd-vxlan-07]

Greg Mirsky <gregimirsky@xxxxxxxxx> · Thu, 20 Jun 2019 13:50:24 +0900

Hello Carlos,
could you please refer me to the specification of BFD that defines the message format that is used in the Echo mode of BFD. Is it the BFD control packet? Something else?

Regards,
Greg

On Thu, Jun 20, 2019 at 11:09 AM Carlos Pignataro (cpignata) <cpignata@xxxxxxxxx> wrote:

Hello, Greg,

Please see inline.

On Jun 19, 2019, at 9:58 PM, Greg Mirsky <gregimirsky@xxxxxxxxx> wrote:

Hello Carlos,
thank you for the expedient clarification.
To your questions on demultiplexing BFD control packets with the zero value of the Your Discriminator field:

only BFD control packets with the zero value of the Your Discriminator field are demultiplexed using the information of the inner IP header. I believe that the text is clear and requires that all fields of the inner IP header must be used to demultiplex
 a received BFD control packet with the zero value in the Your Discriminator field. Which of the fields an implementation uses to create multiple BFD sessions between the pair of VTEPs is implementation specific.

This text is repeating was is in the draft, but does not answer any of my questions.

For example:
1. "that all fields of the inner IP header must be used to demultiplex a received BFD control packet”
    -> The text does not say “all fields”, but regardless, do you mean the DSCP and the Evil Bit? IPv6 Flow Label? How *exactly*?
2. How is the mapping of IP (not UDP?) fields to BFD session done?
3. How is this state created and maintained? 
4. Since this is a set of fields on which two systems need to agree (which fields from the inner IP/UDP are mapped needs to be understood by both systems), it cannot be “implementation specific”. Further, the text does not say so.

To your point on the level Echo mode of BFD is specified in RFC 5880 I'll quote the opinion of Jeffrey Haas from the discussion of comments from Shawn Emery on behalf of the SecDir. Shawn had commented:

Echo BFD is out of scope for the document, but does not describe the reason for this or why state

this at all?

I've responded:

GIM>> I think that the main reason is that the BFD Echo mode is underspecified. RFC 5880 defined some of the mechanisms related to the Echo mode, but more standardization work may be required.  

And Jeffrey Haas had added:

Speaking as a BFD chair, this is the relevant observation.  BFD Echo is
underspecified to the point where claiming compliance is difficult at best.
In general, it relies on single-hop and the ability to have the remote Echo
client loop the packets. 

BFD Echo cannot be specified in RFC 5880 base spec because it is application specific.

This packet loop may not be practical for several encapsulations and thus is
out of scope for such encapsulations.  Whether this is practical for vxlan
today, or in the presence of future extensions to vxlan is left out of scope
for the core proposal.  

The question remains: for VXLAN encapsulation, this is like a single hop as far as BFD is concerned (single hop VXLAN tunnel).

Since RFC 5881 defines Echo for single hop, can you please elaborate (in the document) why is out of scope or how it can work?

Best,

Carlos.

Will respond to other questions in a separate mail.

Regards,
Greg

On Thu, Jun 20, 2019 at 10:31 AM Carlos Pignataro (cpignata) <cpignata@xxxxxxxxx> wrote:

Hello, Greg,

> On Jun 19, 2019, at 9:09 PM, Greg Mirsky <gregimirsky@xxxxxxxxx> wrote:

> 

> Hi Carlos,

> thank you for reminding of our continued discussion with Joel. We are seeking comments from VXLAN experts and much appreciate if you have insights on VXLAN to share.

> I've got some clarifying questions before I can respond to you.

Sure.

> To which stage of the three-way handshake you refer as "initial demultiplexing"? I couldn't find this term in RFC 5880.

“Initial demultiplexing" is a well-known term in BFD, referring to the "demultiplexing of the initial packets", BFD Control packet with YourDisc being zero.

In RFC 5880, see Section 6.3.

https://tools.ietf.org/html/rfc5880#section-6.3

   The method of demultiplexing the initial packets (in which Your

   Discriminator is zero) is application dependent, and is thus outside

   the scope of this specification.

Since initial demultiplexing is indeed application specific, different for one-hop versus multi-hop and dependent upon whether a single or multiple sessions are allowed between a pair of endpoints, I added below two other relevant citations, from application
 specific BFD specs:

1. 
https://tools.ietf.org/html/rfc5883#section-4 

2. 
https://tools.ietf.org/html/rfc5882#section-6

> Regarding the applicability of the Echo mode, thank you for pointing to the need for stricter terminology, the Echo mode, as defined in RFC 5880, is underspecified and it will require additional standardization.

No. BFD Echo is not underspecified in RFC 5880.

Please read S5: 
https://tools.ietf.org/html/rfc5880#section-5

   BFD Echo packets are sent in an encapsulation appropriate to the

   environment.  See the appropriate application documents for the

   specifics of particular environments.

BFD Echo is application dependent. 

Therefore, for example, single-hop BFD in RFC 5881 specifies BFD Echo for that application.

Hence, my question stands: why is this draft claiming BFD Echo is out of scope for this BFD application document?

> Future drafts may explore and define how the Echo mode of BFD is used over VXLAN tunnels.

> 

See above.

> Will review and respond to the remaining questions soon.

Thank you. 

The "remaining questions" are still all the questions below :-)

Best,

Carlos.

> 

> Regards,

> Greg

> 

> 

> On Thu, Jun 20, 2019 at 9:14 AM Carlos Pignataro (cpignata) <cpignata@xxxxxxxxx> wrote:

> Hi,

> 

> I have not reviewed this draft before, but triggered by this email, and briefly scanning through a couple of sections, it is unclear to me how some of the mechanics work.

> 

> There are some major issues with the Mac usage and association, as Joel Halpern mentioned in his Rtg Dir review.

> 

> And, additionally, please consider the following comments and questions:

> 

> 

> 1. Underspecification for initialization and initial demultiplexing.

> 

> This document allows multiple BFD sessions between a single pair of VTEPs:

> 

>    An

>    implementation that supports this specification MUST be able to

>    control the number of BFD sessions that can be created between the

>    same pair of VTEPs.

> 

> The implication of this is that BFD single-hop initialization procedures will not work. Instead, there is a need to map the initial demultiplexing.

> 

> This issue is explained in RFCs 5882 and 5883: 
https://tools.ietf.org/html/rfc5883#section-4 and 
https://tools.ietf.org/html/rfc5882#section-6

> 

> Section 5.1 says:

> 

>    For such packets, the BFD session MUST be identified

>    using the inner headers, i.e., the source IP, the destination IP, and

>    the source UDP port number present in the IP header carried by the

>    payload of the VXLAN encapsulated packet.  The VNI of the packet

>    SHOULD be used to derive interface-related information for

>    demultiplexing the packet.

> 

> But this does not really explain how to do the initial demultiplexing. Does each BFD session need to have a separate inner source IP address? Or source UDP port? And how ofter are they recycled or kept as state? How are these mapped?

> Equally importantly, which side is Active?

> And what if there’s a race condition with both sides being Active and setting up redundant sessions?

> 

> 1.b. By the way, based on this, using S-BFD [RFC 7880] might be easier to demux.

> 

> 

> 2. Security

> 

> This document says that the TTL in the inner packet carrying BFD is set to 1. However, RFC 5880 says to use GTSM [RFC 5082], i.e., a value of 255...

> 

> Why is GTSM not used here?

> 

> 

> 3. ECMP and fate-sharing under-specification:

> 

> Section 4.1. says:

> 

>    The Outer IP/UDP

>    and VXLAN headers MUST be encoded by the sender as defined in

>    [RFC7348].

> 

> 

> And RFC 7348 says:

> 

>       -  Source Port:  It is recommended that the UDP source port number

>          be calculated using a hash of fields from the inner packet --

>          one example being a hash of the inner Ethernet frame's headers.

>          This is to enable a level of entropy for the ECMP/load-

>          balancing of the VM-to-VM traffic across the VXLAN overlay.

>          When calculating the UDP source port number in this manner, it

>          is RECOMMENDED that the value be in the dynamic/private port

>          range 49152-65535 [RFC6335].

> 

> 

> Based on this, depending on the hashing calculation, the outer source UDP port can be different leading to different ECMP treatment. Does something else need to be specified here in regards to the outer UDP source port?

> 

> 

> 4. Section 7 says that “ Support for echo BFD is outside the scope of this document”.

> 

> Assuming this means “BFD Echo mode”, why is this out of scope? If this is a single logical hop underneath VXLAN, what’s preventing the use of Echo? Echo’s benefits are huge.

> 

> 

> 5. Terminology

> 

>    Implementations SHOULD ensure that the BFD

>    packets follow the same lookup path as VXLAN data packets within the

>    sender system.

> 

> What is a “look up path within a sender system”?

> 

> 

> 6. Deployment scenarios

> 

> S3 says:

>    Figure 1 illustrates the scenario with two servers, each of them

>    hosting two VMs.  The servers host VTEPs that terminate two VXLAN

> […]

>                      Figure 1: Reference VXLAN Domain

> 

> 

> However, RFC 7348 Figure 3 lists that as one deployment scenario, not as “the scenario” and “The Reference VXLAN Domain”.

> 

> Best,

> 

> Carlos.

> 

>> On Jun 17, 2019, at 12:58 AM, Greg Mirsky <gregimirsky@xxxxxxxxx> wrote:

>> 

>> Hi Oliver,

>> thank you for your thorough review, clear and detailed questions. My apologies for the delay to respond. Please find my answers below in-line tagged GIM>>.

>> 

>> Regards,

>> Greg

>> 

>> On Fri, May 31, 2019 at 12:38 PM Olivier Bonaventure via Datatracker <noreply@xxxxxxxx> wrote:

>> Reviewer: Olivier Bonaventure

>> Review result: Ready with Issues

>> 

>> This document has been reviewed as part of the transport area review team's

>> ongoing effort to review key IETF documents. These comments were written

>> primarily for the transport area directors, but are copied to the document's

>> authors and WG to allow them to address any issues raised and also to the IETF

>> discussion list for information.

>> 

>> When done at the time of IETF Last Call, the authors should consider this

>> review as part of the last-call comments they receive. Please always CC

>> tsv-art@xxxxxxxxx if you reply to or forward this review.

>> 

>> I have only limited knowledge of VXLAN and do not know all subtleties of BFD.

>> This review is thus more from a generalist than a specialist in this topic.

>> 

>> Major issues

>> 

>> Section 4 requires that " Implementations SHOULD ensure that the BFD

>>    packets follow the same lookup path as VXLAN data packets within the

>>    sender system."

>> 

>> Why is this requirement only relevant for the lookup path on the sender system

>> ? What does this sentence really implies ?

>> GIM>> RFC 5880 set the scope of the fault detection of BFD protocol as 

>>    ... the bidirectional path between two forwarding engines, including

>>    interfaces, data link(s), and to the extent possible the forwarding

>>    engines themselves ...

>> The requirement aimed to the forwarding engine of a BFD system that transmits BFD control packets over VXLAN tunnel.

>> 

>> Is it a requirement that the BFD packets follow the same path as the data

>> packet for a given VXLAN ? I guess so. In this case, the document should

>> discuss how Equal Cost Multipath could affect this.

>> GIM>> I think that ECMP environment is more likely to be experienced by a transit node in the underlay. If the BFD session is used to monitor the specific underlay path, then, I agree, we should explain that using the VXLAN payload information to draw path
 entropy may cause data and BFD packets following different underlay routes.. But, on the other hand, that is the case for OAM and fault detection in all overlay networks in general.

>> 

>> Minor issues

>> 

>> Section 1

>> 

>> You write "The asynchronous mode of BFD, as defined in [RFC5880],

>>  can be used to monitor a p2p VXLAN tunnel."

>> 

>> Why do you use the word can ? It is a possibility or a requirement ?

>> GIM>> In principle, BFD Demand mode may be used to monitor p2p paths as well, I agree, will re-word to more assertive:

>>  The asynchronous mode of BFD, as defined in [RFC5880],

>>  is used to monitor a p2p VXLAN tunnel.

>> 

>> NVE has not been defined before and is not in the terminology.

>> GIM>> Will add to the Terminology and expand as:

>> NVE        Network Virtualization Endpoint 

>> 

>> This entire section is not easy to read for an outsider.

>> 

>> Section 3

>> 

>> VNI has not been defined

>> GIM>> Will add to the Terminology section:

>> VNI    VXLAN Network Identifier (or VXLAN Segment ID)

>> 

>> Figure 1 could take less space

>> GIM>> Yes, can make it bit denser. Would the following be an improvement?

>>  

>>       +------------+-------------+

>>       |        Server 1          |

>>       | +----+----+  +----+----+ |

>>       | |VM1-1    |  |VM1-2    | |

>>       | |VNI 100  |  |VNI 200  | |

>>       | |         |  |         | |

>>       | +---------+  +---------+ |

>>       | Hypervisor VTEP (IP1)    |

>>       +--------------------------+

>>                             |

>>                             |   +-------------+

>>                             |   |   Layer 3   |

>>                             +---|   Network   |

>>                                 +-------------+

>>                                     |

>>                                     +-----------+

>>                                                 |

>>                                          +------------+-------------+

>>                                          |    Hypervisor VTEP (IP2) |

>>                                          | +----+----+  +----+----+ |

>>                                          | |VM2-1    |  |VM2-2    | |

>>                                          | |VNI 100  |  |VNI 200  | |

>>                                          | |         |  |         | |

>>                                          | +---------+  +---------+ |

>>                                          |      Server 2            |

>>                                          +--------------------------+

>> 

>> 

>> Section 4

>> 

>> I do not see the benefits of having one paragraph in Section 4 followed by only

>> Section 4.1

>> GIM>> Will merge Section 4.1 into 4 with minor required re-wording:

>> 4.  BFD Packet Transmission over VXLAN Tunnel

>> 

>>    BFD packet MUST be encapsulated and sent to a remote VTEP as

>>    explained in this section.  Implementations SHOULD ensure that the

>>    BFD packets follow the same lookup path as VXLAN data packets within

>>    the sender system.

>> 

>>    BFD packets are encapsulated in VXLAN as described below.  The VXLAN

>>    packet format is defined in Section 5 of [RFC7348].  The Outer IP/UDP

>>    and VXLAN headers MUST be encoded by the sender as defined in

>>    [RFC7348].

>> 

>> Section 4.1

>> 

>> The document does not specify when a dedicated MAC address or the MAC address

>> of the destination VTEP must be used. This could affect the interoperability of

>> implementations. Should all implementations support both the dedicated MAC

>> address and the destination MAC address ?

>> GIM>> After further discussion, authors decided to remove the request for the dedicated MAC address allocation. Only the MAC address of the remote VTEP must be used as the destination MAC address in the inner Ethernet frame. Please check the attached diff
 between the -07 and the working versions or the working version of the draft.

>> 

>> It is unclear from this section whether IPv4 inside IPv6 and the opposite

>> should be supported or not.

>> GIM>> Any combination of outer IPvX and inner IPvX is possible.

>> 

>> Section 5.

>> 

>> If the received packet does not match the dedicated MAC address nor the MAC

>> address of the VTEP, should the packet be silently discarded or treated

>> differently ?

>> GIM>> As I've mentioned earlier, authors have decided to remove the use of the dedicated MAC address for BFD over VXLAN.

>> 

>> Section 5.1

>> 

>> Is this a modification to section 6.3 of RFC5880 ? This is not clear

>> GIM>> I think that this section is not modification but the definition of the application-specific procedure that is outside the scope of RFC 5880:

>>    The method of demultiplexing the initial packets (in which Your

>>    Discriminator is zero) is application dependent, and is thus outside

>>    the scope of this specification.

>> 

>> Section 9

>> 

>> The sentence " Throttling MAY be relaxed for BFD packets

>>    based on port number." is unclear.

>> GIM>> Yes, thank you for pointing to this. The updated text, in the whole paragraph, is as follows:

>> NEW TEXT:

>>    The document requires setting the inner IP TTL to 1, which could be

>>    used as a DDoS attack vector.  Thus the implementation MUST have

>>    throttling in place to control the rate of BFD control packets sent

>>    to the control plane.  On the other hand, over aggressive throttling

>>    of BFD control packets may become the cause of the inability to form

>>    and maintain BFD session at scale.  Hence, throttling of BFD control

>>    packets SHOULD be adjusted to permit BFD to work according to its

>>    procedures.

>> <draft-ietf-bfd-vxlan-08.txt><Diff_ draft-ietf-bfd-vxlan-07.txt - draft-ietf-bfd-vxlan-08.txt.html>

>