Re: Tsvart last call review of draft-ietf-bfd-vxlan-07

Greg Mirsky <gregimirsky@xxxxxxxxx> · Thu, 20 Jun 2019 10:09:20 +0900

Hi Carlos,thank you for reminding of our continued discussion with Joel. We are seeking comments from VXLAN experts and much appreciate if you have insights on VXLAN to share.
I've got some clarifying questions before I can respond to you. To which stage of the three-way handshake you refer as "initial demultiplexing"? I couldn't find this term in RFC 5880.
Regarding the applicability of the Echo mode, thank you for pointing to the need for stricter terminology, the Echo mode, as defined in RFC 5880, is underspecified and it will require additional standardization. Future drafts may explore and define how the Echo mode of BFD is used over VXLAN tunnels.

Will review and respond to the remaining questions soon.

Regards,
Greg

On Thu, Jun 20, 2019 at 9:14 AM Carlos Pignataro (cpignata) <cpignata@xxxxxxxxx> wrote:

Hi,

I have not reviewed this draft before, but triggered by this email, and briefly scanning through a couple of sections, it is unclear to me how some of the mechanics work.

There are some major issues with the Mac usage and association, as Joel Halpern mentioned in his Rtg Dir review.

And, additionally, please consider the following comments and questions:

1. Underspecification for initialization and initial demultiplexing.

This document allows multiple BFD sessions between a single pair of VTEPs:

   An
   implementation that supports this specification MUST be able to
   control the number of BFD sessions that can be created between the
   same pair of VTEPs.

The implication of this is that BFD single-hop initialization procedures will not work. Instead, there is a need to map the initial demultiplexing.

This issue is explained in RFCs 5882 and 5883: https://tools.ietf.org/html/rfc5883#section-4 and https://tools.ietf.org/html/rfc5882#section-6

Section 5.1 says:

   For such packets, the BFD session MUST be identified
   using the inner headers, i.e., the source IP, the destination IP, and
   the source UDP port number present in the IP header carried by the
   payload of the VXLAN encapsulated packet.  The VNI of the packet
   SHOULD be used to derive interface-related information for
   demultiplexing the packet.

But this does not really explain how to do the initial demultiplexing. Does each BFD session need to have a separate inner source IP address? Or source UDP port? And how ofter are they recycled or kept as state? How are these mapped?
Equally importantly, which side is Active?
And what if there’s a race condition with both sides being Active and setting up redundant sessions?

1.b. By the way, based on this, using S-BFD [RFC 7880] might be easier to demux.

2. Security

This document says that the TTL in the inner packet carrying BFD is set to 1. However, RFC 5880 says to use GTSM [RFC 5082], i.e., a value of 255..

Why is GTSM not used here?

3. ECMP and fate-sharing under-specification:

Section 4.1. says:

   The Outer IP/UDP
   and VXLAN headers MUST be encoded by the sender as defined in
   [RFC7348].

And RFC 7348 says:

      -  Source Port:  It is recommended that the UDP source port number
         be calculated using a hash of fields from the inner packet --
         one example being a hash of the inner Ethernet frame's headers.
         This is to enable a level of entropy for the ECMP/load-
         balancing of the VM-to-VM traffic across the VXLAN overlay.
         When calculating the UDP source port number in this manner, it
         is RECOMMENDED that the value be in the dynamic/private port
         range 49152-65535 [RFC6335].

Based on this, depending on the hashing calculation, the outer source UDP port can be different leading to different ECMP treatment. Does something else need to be specified here in regards to the outer UDP source port?

4. Section 7 says that “ Support for echo BFD is outside the scope of this document”. 

Assuming this means “BFD Echo mode”, why is this out of scope? If this is a single logical hop underneath VXLAN, what’s preventing the use of Echo? Echo’s benefits are huge.

5. Terminology

   Implementations SHOULD ensure that the BFD
   packets follow the same lookup path as VXLAN data packets within the
   sender system.

What is a “look up path within a sender system”?

6. Deployment scenarios

S3 says:

   Figure 1 illustrates the scenario with two servers, each of them
   hosting two VMs.  The servers host VTEPs that terminate two VXLAN
[…]

                     Figure 1: Reference VXLAN Domain

However, RFC 7348 Figure 3 lists that as one deployment scenario, not as “the scenario” and “The Reference VXLAN Domain”.

Best,

Carlos.

On Jun 17, 2019, at 12:58 AM, Greg Mirsky <gregimirsky@xxxxxxxxx> wrote:

Hi Oliver,
thank you for your thorough review, clear and detailed questions. My apologies for the delay to respond. Please find my answers below in-line tagged GIM>>.

Regards,
Greg

On Fri, May 31, 2019 at 12:38 PM Olivier Bonaventure via Datatracker <noreply@xxxxxxxx> wrote:

Reviewer: Olivier Bonaventure

Review result: Ready with Issues

This document has been reviewed as part of the transport area review team's

ongoing effort to review key IETF documents. These comments were written

primarily for the transport area directors, but are copied to the document's

authors and WG to allow them to address any issues raised and also to the IETF

discussion list for information.

When done at the time of IETF Last Call, the authors should consider this

review as part of the last-call comments they receive. Please always CC

tsv-art@xxxxxxxx if you reply to or forward this review.

I have only limited knowledge of VXLAN and do not know all subtleties of BFD.

This review is thus more from a generalist than a specialist in this topic.

Major issues

Section 4 requires that " Implementations SHOULD ensure that the BFD

   packets follow the same lookup path as VXLAN data packets within the

   sender system."

Why is this requirement only relevant for the lookup path on the sender system

? What does this sentence really implies ?

GIM>> RFC 5880 set the scope of the fault detection of BFD protocol as 
   ... the bidirectional path between two forwarding engines, including

   interfaces, data link(s), and to the extent possible the forwarding

   engines themselves ...
The requirement aimed to the forwarding engine of a BFD system that transmits BFD control packets over VXLAN tunnel.

Is it a requirement that the BFD packets follow the same path as the data

packet for a given VXLAN ? I guess so. In this case, the document should

discuss how Equal Cost Multipath could affect this.

GIM>> I think that ECMP environment is more likely to be experienced by a transit node in the underlay. If the BFD session is used to monitor the specific underlay path, then, I agree, we should explain that using the VXLAN payload information
 to draw path entropy may cause data and BFD packets following different underlay routes. But, on the other hand, that is the case for OAM and fault detection in all overlay networks in general.

Minor issues

Section 1

You write "The asynchronous mode of BFD, as defined in [RFC5880],

 can be used to monitor a p2p VXLAN tunnel."

Why do you use the word can ? It is a possibility or a requirement ?

GIM>> In principle, BFD Demand mode may be used to monitor p2p paths as well, I agree, will re-word to more assertive:
 The asynchronous mode of BFD, as defined in [RFC5880],
 is used to monitor a p2p VXLAN tunnel.

NVE has not been defined before and is not in the terminology.

GIM>> Will add to the Terminology and expand as:
NVE        Network Virtualization Endpoint 

This entire section is not easy to read for an outsider.

Section 3

VNI has not been defined

GIM>> Will add to the Terminology section:
VNI    VXLAN Network Identifier (or VXLAN Segment ID)

Figure 1 could take less space

GIM>> Yes, can make it bit denser. Would the following be an improvement?

      +------------+-------------+
      |        Server 1          |
      | +----+----+  +----+----+ |
      | |VM1-1    |  |VM1-2    | |
      | |VNI 100  |  |VNI 200  | |
      | |         |  |         | |
      | +---------+  +---------+ |
      | Hypervisor VTEP (IP1)    |
      +--------------------------+
                            |
                            |   +-------------+
                            |   |   Layer 3   |
                            +---|   Network   |
                                +-------------+
                                    |
                                    +-----------+
                                                |
                                         +------------+-------------+
                                         |    Hypervisor VTEP (IP2) |
                                         | +----+----+  +----+----+ |
                                         | |VM2-1    |  |VM2-2    | |
                                         | |VNI 100  |  |VNI 200  | |
                                         | |         |  |         | |
                                         | +---------+  +---------+ |
                                         |      Server 2            |
                                         +--------------------------+

Section 4

I do not see the benefits of having one paragraph in Section 4 followed by only

Section 4.1

GIM>> Will merge Section 4.1 into 4 with minor required re-wording:
4.  BFD Packet Transmission over VXLAN Tunnel

   BFD packet MUST be encapsulated and sent to a remote VTEP as

   explained in this section.  Implementations SHOULD ensure that the

   BFD packets follow the same lookup path as VXLAN data packets within

   the sender system.

   BFD packets are encapsulated in VXLAN as described below.  The VXLAN

   packet format is defined in Section 5 of [RFC7348].  The Outer IP/UDP

   and VXLAN headers MUST be encoded by the sender as defined in

   [RFC7348].

Section 4.1

The document does not specify when a dedicated MAC address or the MAC address

of the destination VTEP must be used. This could affect the interoperability of

implementations. Should all implementations support both the dedicated MAC

address and the destination MAC address ?

GIM>> After further discussion, authors decided to remove the request for the dedicated MAC address allocation. Only the MAC address of the remote VTEP must be used as the destination MAC address in the inner Ethernet frame. Please check the attached
 diff between the -07 and the working versions or the working version of the draft.

It is unclear from this section whether IPv4 inside IPv6 and the opposite

should be supported or not.

GIM>> Any combination of outer IPvX and inner IPvX is possible.

Section 5.

If the received packet does not match the dedicated MAC address nor the MAC

address of the VTEP, should the packet be silently discarded or treated

differently ?

GIM>> As I've mentioned earlier, authors have decided to remove the use of the dedicated MAC address for BFD over VXLAN.

Section 5.1

Is this a modification to section 6.3 of RFC5880 ? This is not clear

GIM>> I think that this section is not modification but the definition of the application-specific procedure that is outside the scope of RFC 5880:
   The method of demultiplexing the initial packets (in which Your

   Discriminator is zero) is application dependent, and is thus outside

   the scope of this specification.

Section 9

The sentence " Throttling MAY be relaxed for BFD packets

   based on port number." is unclear.

GIM>> Yes, thank you for pointing to this. The updated text, in the whole paragraph, is as follows:
NEW TEXT:
   The document requires setting the inner IP TTL to 1, which could be

   used as a DDoS attack vector.  Thus the implementation MUST have

   throttling in place to control the rate of BFD control packets sent

   to the control plane.  On the other hand, over aggressive throttling

   of BFD control packets may become the cause of the inability to form

   and maintain BFD session at scale.  Hence, throttling of BFD control

   packets SHOULD be adjusted to permit BFD to work according to its

   procedures.

<draft-ietf-bfd-vxlan-08.txt><Diff_ draft-ietf-bfd-vxlan-07.txt - draft-ietf-bfd-vxlan-08.txt.html>