I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last call comments. For more information, please see the FAQ at <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. Document: draft-ietf-bfd-seamless-use-case-04 Reviewer: Dale R. Worley Review Date: 2016-04-01 IETF LC End Date: 2016-04-12 IESG Telechat date: 2016-05-05 Summary: This draft is on the right track but has open issues, described in the review. Major issues: In various places the description needs to be made clearer. I believe that the authors have a good idea of what is intended, but in some places the descriptions are not clear to the general reader. Nits/editorial comments: There are various problems with English usage (e.g., missing articles) and punctuation (e.g., excessive commas), which can be taken care of by the Editor. But the overall structure and clarity of several paragraphs needs improvement. ---------------------------------------------------------------------- General What is the meaning of "seamless"? The term "seamless BFD" is used in the title and in the title of section 2, "Introduction to Seamless BFD", and in exactly one other place in the document: If this information is already known to the end-points of a potential BFD session, the initial handshake including an exchange of this node-specific information is unnecessary and it is possible for the end points to begin BFD messaging seamlessly. At no point is "seamless BFD" or the specific meaning of "seamless" defined. I suspect that the authors have a strong intuitive sense of the behaviors they identify as "seamless", and it would be helpful if that could be stated in the Introduction. Abstract The Abstract reads: This document provides various use cases for Bidirectional Forwarding Detection (BFD) and various requirements such that extensions could be developed to allow for simplified detection of forwarding failures. It seems unlikely that adding extensions to a protocol will "simplify" it (other than in the case of "MPLS BFD Session Per ECMP Path"), so it seems that the Abstract could be phrased better. It seems like a major goal of the draft is making it possible to accelerate the establishment of a BFD session. But that is not mentioned in the Abstract. Section 1 Bidirectional Forwarding Detection (BFD) is a lightweight protocol, as defined in [RFC5880], used to detect forwarding failures. Various protocols and applications rely on BFD for failure detection. Even though the protocol is simple, there are certain use cases, where faster setting up of sessions and continuity check of the data forwarding paths is necessary. This document identifies various use cases and requirements related to those, such that necessary enhancements could be made to BFD protocol. The phrase "Even though the protocol is simple" is not relevant to the remainder of the sentence it appears in and probably can be deleted. "This document..." would better be "This document identifies these use cases and the consequent requirements for extensions to the BFD protocol." The phrase "continuity check of the data forwarding paths" seems to be disconnected. I suspect the problem is a lack of parallelism, due to "setting up" and "check". You probably want to say "faster setting up of sessions and faster continuity checking of the data forwarding paths". The phrase "complexity, not only from an operations point of view, but also in terms of the speed at which these sessions could be established or deleted" attaches "speed" to "complexity", which isn't quite correct. Better would be "creates operational complexity, but also causes undesirable delay in establishing or deleting sessions" Section 2 The second paragraph says: In order for BFD to be able to initially verify that a connection is valid and that it connects the expected set of end points, it is necessary to provide the node information associated with the connection at each end point prior to initiating BFD sessions, such that this information can be used to verify that the connection is up and verifiable. I think it would help if the nature of the "node information" was made explicit. It seems like this paragraph strongly related to the aspect of BFD that is *not* defined in RFC 5880: The method of demultiplexing the initial packets (in which Your Discriminator is zero) is application dependent, and is thus outside the scope of this specification. Presumably the "node information" is what is used to perform the demultiplexing of the initial packets. Explaining this in more detail might make the design problem(s) clearer to the inexperienced reader. The third paragraph seems to be about accelerating the establishment of a BFD session between two nodes. With baseline BFD, establishing a session requires the two nodes to exchange BFD packets, which include the discriminators assigned by each node to the session. It seems that a goal of this draft is to avoid needing to exchange the initial packets before the BFD session is established, with the goal of getting to the established state more quickly. But this is not explicitly stated, nor is the manner in which "seamless BFD" would avoid it. As far as I can tell, the problem is that before a session is established, BFD is limited to sending one packet per second, and so the establishment of a session requires one or two seconds, regardless of the speed of the link. If the time to establish a BFD session is of central concern, it would be helpful to present an analysis of how long it takes baseline BFD to establish a session, and how long it might take an alternative BFD startup method to establish a session. In addition to the discriminators, the initial BFD packets also include the BFD packet interval parameters, "Detect Mult", "Desired Min TX Interval", "Required Min RX Interval", and "Required Min Echo RX Interval". What allows BFD to have a very short Detection Time in favorable situations is that the interval parameters can be much shorter than one second. But that implies that any system for quick-starting a BFD session has to transmit the interval parameters as well as the discriminators, or the BFD startup process still has to exchange packets before the full sending rate has been established. Then again, perhaps the phrase "node information" in this paragraph includes the interval parameters, instead of just the discriminators mentioned in the previous paragraph, in which case that should be made clearer. Is the fourth paragraph a description of the proposed "seamless BFD" and how it differs from baseline BFD? The fourth paragraph contains "Each of those network entities is assigned a BFD discriminator, to establish a BFD session." But this seems to be incorrect -- each network entity is assigned a BFD discriminator for each BFD session that the entity will participate in (RFC 5880 section 6.3). I can't tell whether this is a fundamental misunderstanding on the part of the authors, merely incorrect wording, or if S-BFD includes a technique by which a node can use the same discriminator for all of its BFD sessions -- that should be clarified. Section 3.1 This section isn't clear about the distinction between "verifying forwarding in one direction only" and "not needing to provision the target node, only the source node" -- the first is a relaxation on the requirements on what BFD detects, the second is a strengthening on the requirements on how BFD can be configured. Despite saying that the target would not need to be configured, as discussed in this section, BFD would still need to be configured at the target node to know the discriminator of the source node: "When the targeted network entity receives the packet, it knows that BFD packet, based on the discriminator and processes it." I am not understanding the sense in which "unidirectional" is being used. It seems that the only need is to verify transmission in one direction between the two nodes. The target node can verify successful transmission if it receives the control packets from the source node. But the source node can only know that transmission is working if it receives reply nodes from the target node. So despite that only needing to test transmission in one direction, transmission must be done in both directions. Or the purpose to send the live/dead determination to the "centralized controller", and it is not required that the source know the state of the path? Section 3.2 The first paragraph is BFD provides data delivery confidence when reachability validation is performed prior to traffic utilizing specific paths/LSPs. However this comes with a cost, where, traffic is prevented to use such paths/LSPs until BFD is able to validate the reachability, which could take seconds due to BFD session bring-up sequences [RFC5880], LSP ping bootstrapping [RFC5884], etc. This use case could be well supported by eliminating the need for session negotiation and discriminator exchanges in order to establish the BFD session. As far as I can tell, the use case is "when reachability validation is performed prior to traffic utilizing specific paths/LSPs". But the first sentence isn't structured to emphasize that, so it's difficult to tell what "This use case" means. Better would be some thing like This use case is when BFD is used to verify reachability before sending traffic via a path/LSP. This comes with a cost, which is that traffic is prevented to use the path/LSP until BFD is able to validate the reachability, which could take seconds ... . This use case would be better supported by eliminating the need for the initial BFD session negotiation. The second paragraph says "All it takes is for the network entities to know what the discriminator values to be used for the session." But as in section 2, the interval parameters must be configured as well before a BFD session is functioning. Section 3.3 The last two paragraphs are Traditional BFD session establishment and validation of the forwarding path must not become a bottleneck in the case of centralized traffic engineering. If the controller or other centralized entity is able to instantly verify a forwarding path of the TE tunnel , it could steer the traffic onto the traffic engineered tunnel very quickly thus minimizing adverse effect on a service. This is especially useful and needed when the scale of the network and number of TE tunnels is very high. Don't use the word "instantly": Nothing happens "instantly" if it involves events at two or more physically distinct locations. (299,792,458 metres per second -- It's not just a good idea, it's the law!) The cost associated with BFD session negotiation and establishment of BFD sessions to identify valid paths is very high and providing network redundancy becomes a critical issue. It would help to specify that the "cost" is primarily due to the time delay: "The cost associated with the time required for BFD session negotiation and ... is very high when providing network redundancy is a critical issue." Section 3.4 The final paragraph is: To support this use case, BFD MUST be able to perform liveness detection initated from centralized controller for any given segment under its domain. This isn't a requirement on BFD per se, it's a requirement on the agents that implement BFD in nodes. But that is not a protocol requirement either, since this document isn't specifying a protocol between a centralized controller and a BFD agent. I think what is intended is that there should be a standard way by which a centralized controller can instruct the two BFD agents in two nodes to initiate a BFD session along a path, and then can then monitor whether the BFD session determines that the path between the nodes is working. But if so, that should be stated clearly. Section 3.5 The final paragraph is: The established BFD session parameters and attributes like transmission interval, receiver interval, etc., MUST be modifiable without changing the state of the session. Unfortunately, the term "state" has this definition (RFC 5880 section 4.1): State (Sta) The current BFD session state as seen by the transmitting system. Values are: 0 -- AdminDown 1 -- Down 2 -- Init 3 -- Up It seems to me that the requirement is better captured by the last sentence of the preceding paragraph: "In these scenarios, it is desirable for BFD to slow down, speed up, stop or resume at will witho minimal [sic] additional BFD packets exchanged to establish a new or modified session." But that sentence is not quite good enough, since what the preceding part of the paragraph asked for was "... with no additional BFD packets exchanged", whereas the final sentence says "minimal". What is the requirement? If it is "no additional packets", that's clear. If what is needed is a reduction in the additional packets, it would help if there was an analysis of how many additional packets are now needed and what potential reduction might be obtained, so that the reader has some idea what "minimal" means. Section 3.6 First, this use case needs to make it clear what it is testing: That a source node can send a packet to an anycast address, and that the target node to which the packet is delivered can send a response packet to the source node. Of course, baseline BFD doesn't verify that, because it does not provide for a set of BFD agents to collectively form one endpoint of a BFD session. Within that goal, there is an additional requirement that there is no need to establish separate BFD sessions between the source node and every node that receives for the anycast address. But there is an ambiguity -- is it required that target nodes that do not happen to receive any of the BFD packets do not need to maintain any state, or is it that the source node does not need to maintain separate state for each target node? Section 3.7 This section talks about fault isolation very abstractly. Is there a definition as to what constitutes fault isolation? (Or is this definition well-known in the routing world?) Section 3.8 With distributed architectures of BFD implementations, this can be protected, if a node was to run multiple BFD sessions to targets, hosted on different parts of the system (ex: different CPU instances). This can reduce BFD false failures, resulting in more stable network. This is true, but it is not clear what the new requirements are. I see in RFC 5880 section 6.3 Since multiple BFD sessions may be running between two systems, there needs to be a mechanism for demultiplexing received BFD packets to the proper session. ... The method of demultiplexing the initial packets (in which Your Discriminator is zero) is application dependent, and is thus outside the scope of this specification. Is the question one of how to demultiplex the initial packets from multiple BFD sessions in the same source device? Section 3.9 [no complaints] Section 4 It would help if there were cross-references between the scenarios/use cases and the requirements. REQ#1 "MUST start processing for the discriminator" is unclear. Does this mean "MUST establish a session", "MUST be able to send a response", or what? REQ#2 See comments on section 3.1. REQ#3 Does this include not needing to exchange interval parameters as well? REQ#4 I suspect this requirement is only operational in the scenario of section 3.4, a Segment Routed network. It might be useful to qualify the requirement this way, since otherwise "centralized controller" and "segment" don't have a context. Or is S-BFD only intended for situations with a centralized controller? REQ#5 See comments for section 3.5. REQ#6 "This requirement does not require BFD session establishment with every node hosting the anycast address." is not what is intended. Rather, it should be something like appending "... without establishing a separate BFD session with every node hosing the anycast address" to the first session. As written, the requirement "does not require session establishment with every node" whereas the intention is to "require that there not be session establishment with every node". REQ#7 See comments for section 3.7.