Frank,
Only in regards to : ...FB: Thanks for the suggested updates, which would definitively improve the text. One problem about SFC performance remains though IMHO. All the text so far is focused on the connectivity within a SFC - not the service itself. I.e. If you'd consider a "laundry service" - we focus a lot on how long it takes to get the clothes shipped to and from the washing machine, but we don't focus on how well the washing machine washes the clothes. IMHO we should either expand on the performance of the SFC and SF wrt/ the service (especially given that you define a service layer in section 2) - or clearly state that the framework would just focus on connectivity between SFs.
And ...FB: See above. This would be about adding a capability to assess how well the washing machine washes my laundry.
Please follow the discussion at these threads:
Some of the resulting text is at:
Best,
Carlos.
Hi Nagendra,Thanks for the detailed reply. Please see inline (..FB).-----Original Message----- From: Nagendra Kumar Nainar (naikumar) <naikumar@xxxxxxxxx> Sent: Samstag, 16. Mai 2020 16:16 To: Frank Brockners (fbrockne) <fbrockne@xxxxxxxxx>; tsv-art@xxxxxxxx Cc: sfc@xxxxxxxx; last-call@xxxxxxxx; draft-ietf-sfc-oam-framework.all@xxxxxxxx Subject: Re: Tsvart telechat review of draft-ietf-sfc-oam-framework-13
Hi Frank,
Thank you for the review. Please see inline for the response..
Reviewer: Frank Brockners Review result: Ready with Nits
This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information.
When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@xxxxxxxx if you reply to or forward this review.
This document provides a reference framework for OAM for SFC.
Comments:
Section 3.1.1 SF availability: The text makes explicit reference to multiple instances of a SF. Consequently, it should be defined how availability of a SF is computed/determined in case multiple instances are deployed.
<Nagendra> This is already clarified in the section as below:
"For cases where multiple instances of an SF are used to realize a given SF for the purpose of load sharing, SF availability can be performed by checking the availability of any one of those instances, or the availability check may be targeted at a specific instance."
This further leads to the question, whether availability is always a "binary" state (available / not-available), or could a SF be e.g. 99% available?
<Nagendra>The availability is measured as binary state. I am not sure what is 99% available. If it means getting 99 responses for 100 probes sent, I think it falls under packet loss category which in turn is performance measurement.
...FB: Thanks. Though I'm still not entirely following. If availability is binary and I put the statements above together, what would be the availability of the following setup: There is an SF that is made up of 100 instances. 99 of these instances are powered down entirely. And the 1 instance that is "up" is alternating between servicing requests for 10min followed by not servicing requests for 10min. Would the SF be considered "available"? Section 3.1.2 SF performance: What is the impact of a "multiple instance SF deployment" on SF performance measurement?
<Nagendra>I think we covered this in SF availability but not here. Does the below updated text look better?
OLD: On the one hand, the performance of any specific SF can be quantified by measuring the loss and delay metrics of the traffic from SFF to the respective SF, while on the other hand, the performance can be measured by leveraging the loss and delay metrics from the respective SFs. The latter requires SF involvement to perform the measurement while the former does not.
NEW: On the one hand, the performance of any specific SF can be quantified by measuring the loss and delay metrics of the traffic from SFF to the respective SF, while on the other hand, the performance can be measured by leveraging the loss and delay metrics from the respective SFs. The latter requires SF involvement to perform the measurement while the former does not. For cases where multiple instances of an SF are used to realize a given SF for the purpose of load sharing, SF performance can be quantified by measuring the metrics for any one instance of SF or by measuring the metrics for a specific instance.
The section only talks about loss and delay as performance criteria. It would be good to state that other performance criteria (e.g. specific to the SF, throughput, etc.) exist.
<Nagendra> We can add the below to Section 3.1.2:
NEW: "The metrics measured to quantify the performance of the SF component is not just limited to loss and delay. Other metrics such as throughout also exist and the choice of metrics for performance measurement is outside the scope of this document."
Section 3.2.1 SFC availability: The current definition is very focused on connectivity verification, i.e. it tries to answer the question: "Does my SFC transport packets?". IMHO we should also ask the question "Does my SFC process the packets correctly?" - because if packets are not processed per the SFC definition, we might not call the SFC available.
<Nagendra> I think this is already handled by SF availability. The end-to-end SFC availability is verified by steering the OAM packet over the ordered set of SFs within the SFC. This is more like daisy chaining the availability of SFs within the SFC to determine end-to-end SFC availability. If the derived solution verifies the SF availability not just based on the uptime but based on the service treatment, it also answers the question "Does my SFC process the packets correctly". Let us know if there is any further clarity required.
While 3.2.2 states that "any SFC-aware network device should have the ability to make performance measurements" a similar statement isn't found in 3.2.1. IMHO the ability for availability checks is probably a prerequisite for performance measurement.
<Nagendra> The ability to perform end-to-end or partial SFC availability verification is already mentioned in section 3.2.1 as below:
" In order to perform service connectivity verification of an SFC/SFP, the OAM functions could be initiated from any SFC-aware network devices of an SFC-enabled domain for end-to-end paths, or partial paths terminating on a specific SF, within the SFC/SFP"
Please let us know if you have any suggestion to improve if there is a lack of clarity.
Section 3.2.2 SFC performance measurement: The section only mentions the need for performance measurement. It misses the definition of what SFC performance measurement is.
<Nagendra>
...FB: Thanks for the suggested updates, which would definitively improve the text. One problem about SFC performance remains though IMHO.All the text so far is focused on the connectivity within a SFC - not the service itself. I.e. If you'd consider a "laundry service" - we focus a lot on how long it takes to get the clothes shipped to and from the washing machine, but we don't focus on how well the washing machine washes the clothes. IMHO we should either expand on the performance of the SFC and SF wrt/ the service (especially given that you define a service layer in section 2) - or clearly state that the framework would just focus on connectivity between SFs. Section 3.3. Classifier component: The section mentions the need for the ability to perform performance measurement of the classifier component. What is performance measurement of the classifier? What does performance measurement of the classifier component comprise?
<Nagendra>We can add the below text:
OLD: Any SFC-aware network device should have the ability to perform performance measurement of the classifier component for each SFC.
NEW: Any SFC-aware network device should have the ability to perform performance measurement of the classifier component for each SFC. The performance can be quantified by measuring the performance metrics of the traffic from the classifier for each SFC/SFP.
Section 3.4. / 3.5. Availability/PM of the underlay and overlay network: It would be good to add a sentence that states that the mechanisms for availability/PM which are offered by the technologies used by the overlay/underlay are used, rather than new methods specifically for SFC would be defined.
<Nagendra>Yes, that makes sense. Please check the below text:
OLD: Any SFC-aware network device may have the ability to perform availability check or performance measurement of the overlay network.
NEW: Any SFC-aware network device may have the ability to perform availability check or performance measurement of the overlay network. Any existing OAM tools and techniques can be leveraged for this purpose.
Section 4. SFC OAM Functions: It would be good, if examples in section 4 could also include more "recent" methods such as OWAMP/TWAMP (RFC4656, RFC 5357).
<Nagendra>
OLD: Delay within an SFC could be measured based on the time it takes for a packet to traverse the SFC from the ingress SFC node to the egress SFF. As SFCs are unidirectional in nature, measurement of one-way delay [RFC7679] is important. In order to measure one-way delay, time synchronization MUST be supported by means such as NTP, PTP, GPS, etc.
NEW: Delay within an SFC could be measured based on the time it takes for a packet to traverse the SFC from the ingress SFC node to the egress SFF. Measurement protocols such as One-way Active Measurement Protocol (OWAMP) [RFC4656], Two-way Active Measurement Protocol (TWAMP) [RFC5357] can be used to measure the characteristics. As SFCs are unidirectional in nature, measurement of one-way delay [RFC7679] is important. In order to measure one-way delay, time synchronization MUST be supported by means such as NTP, Precision Time Protocol (PTP), GPS, etc.
Section 4.4. Performance Measurement: Focus is entirely on the PM of the connectivity, rather than on the SF. How about covering PM for the SF as well?
<Nagendra> I am not sure I understand what is missing. Do you have any suggestion for the text improvement?.
...FB: See above. This would be about adding a capability to assess how well the washing machine washes my laundry. Section 5.1 OAM Tool Gap Analysis: - Not sure what "NVo3 OAM" refers to. Could that be explained below the table and in section 1.2.1?
<Nagendra> Combining this with other below queries as they appears to be related.
- E-OAM needs to be detailed. Is seems that CFM (802.1ag) and not 802.3ah is refered to here.
<Nagendra> Per my understanding, 802.ah is 1-hop while 802.3ag can be more than 1 hop and both uses Ethernet frames. So I think both are applicable here. My response regarding E-OAM details in this section is combined below.
...FB: Maybe I missed it - but I don't see text that refers to CFM or EFM OAM. Where is this covered? IMHO we would need references to IEEE standards to avoid confusion. - "Trace" in the "Trace" column need to be extended on. Is this traceroute? Paris-Traceroute? IOAM- Loopback?
IPPM needs to be detailed, because IPPM is not a tool as such but an IETF WG. Does this refer to OWAMP/TWAMP/etc. as defined by IPPM?
<Nagendra> Combining the above queries.
OLD: There are various OAM tool sets available to perform OAM functions within various layers. These OAM functions may be used to validate some of the underlay and overlay networks. Tools like ping and trace are in existence to perform connectivity check and tracing of intermediate hops in a network. These tools support different network types like IP, MPLS, TRILL, etc. There is also an effort to extend the tool set to provide connectivity and continuity checks within overlay networks. BFD is another tool which helps in detecting data forwarding failures. Table 3 below is not exhaustive
NEW: There are various OAM tool sets available to perform OAM functions within various layers. These OAM functions may be used to validate some of the underlay and overlay networks. Tools like ping and trace are used to perform connectivity check and tracing of intermediate hops in a network. These tools are already available for different types of networks such as IP, MPLS, TRILL, etc.
E-OAM offers OAM mechanisms such as an Ethernet continuity check for Ethernet links. There is an effort around NVO3 OAM to provide connectivity and continuity checks for networks that use NVO3. BFD is used for the detection of data plane forwarding failures.
...FB: Check whether NVO3 WG will indeed deliver a solution and "NVO3 OAM" indeed existis. If in doubt, it might be better to avoid forward looking references. Per my note above, it would be good to explicitly refer to IEEE standards as opposed to introducing a new term like "E-OAM". The IPPM framework [RFC 2330] offers tools such as OWAMP [RFC4656] and TWAMP [RFC5357] (collectively referred as IPPM in this section) to measure various performance metrics. MPLS Packet Loss Measurement (LM) and Packet Delay Measurement (DM) (collectively referred as MPLS_PM in this section) [RFC6374] offers the ability to measure performance metrics in MPLS network.
Table 3 below is not exhaustive.
Section 6.4.3 IOAM: - The section states that IOAM "may be used to perform various SFC OAM functions as well". It would be good to expand on this statement: E.g. IOAM Trace-Option Type could be leveraged for SFC tracing. IOAM Direct-Export Option Type could be leveraged. - How would we deal with the IOAM Active Flag (draft-ietf-ippm-ioam-flags-01) when used with SFC OAM?
<Nagendra> The intention of the section is to highlight the applicability of different OAM toolsets for OAM functions at service layer. I am not sure if we really should try explaining all the possible options within each tool. But I agree that it is worth clarifying the availability of IOAM options for tracing. think we can clarify that different IOAM Option-Types are available for OAM functions such as SFC tracing. Can you check if the below looks ok?
OLD: [I-D.ietf-sfc-ioam-nsh] defines how In-Situ OAM data fields are transported using NSH header. [I-D.ietf-sfc-proof-of-transit] defines a mechanism to perform proof of transit to securely verify if a packet traversed the relevant SFP or SFC. While the mechanism is defined inband (i.e., it will be included in data packets), it may be used to perform various SFC OAM functions as well.
NEW: [I-D.ietf-sfc-ioam-nsh] defines how In-Situ OAM data fields are transported using NSH header. [I-D.ietf-sfc-proof-of-transit] defines a mechanism to perform proof of transit to securely verify if a packet traversed the relevant SFP or SFC. While the mechanism is defined inband (i.e., it will be included in data packets), IOAM Option-Types such as IOAM Trace Option-Types can also be used to perform other SFC OAM function such as SFC tracing.
- The text states "In-Situ OAM could be used with O bit set": Why would IOAM be used with the overflow bit set for SFC OAM? For details on IOAM's O-bit, see section 4.4.1 in https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09.
<Nagendra> The O bit referred here is not the O bit in IOAM but the one in NSH/Overlay header. To avoid any confusion, this can be updated as below:
OLD: In-Situ OAM could be used with O bit set to perform SF availability and SFC availability or performance measurement.
NEW: In-Situ OAM could be used with O bit in the overlay header set, to perform SF availability and SFC availability or performance measurement.
... FB: Ah, ok. Given that this section is about IOAM and not NSH, I'd rather explicitly refer to NSH here. E.g. If SFC is realized using NSH, then the O-bit in the NSH header could be used to indicated OAM traffic. You could refer to https://tools.ietf.org/html/draft-ietf-sfc-ioam-nsh-03#section-4.2 explicitly. Section 6.4.4 SFC Traceroute: - This section refers to an expired draft (even calling out the fact that the draft has exipred), but also mentions that functionality is available and implemented in OpenDaylight. Consider removing the references to the expired draft and rather add references to OpenDaylight documents. - IOAM Loopback (see draft-ietf-ippm-ioam-flags-01) could apply SFC Traceroute as well.
<Nagendra>Ok. Let me check if I can find some reference for ODL.
Detailed set of nits that I encountered while reading through the document ([x] references line number x) – hope that they are helpful in further improving the doc:
<Nagendra> Yes of course (.
[global] s/an SF/a SF/ -- and similarly SFC/SFF
<Nagendra>Other RFCs uses "an SF/SFF". So the draft is updated accordingly. If your suggestion is to substitute "a SF" to "an SF", it is done (.
[176] "OAM Controller" not defined
<Nagendra>We can change it as below:
OLD: OAM controllers are assumed to be within the same administrative domain as the target SFC enabled domain.
NEW: OAM controllers are SFC-aware network devices that are capable of generating OAM packets. They are assumed to be within the same administrative domain as the target SFC enabled domain.
[202] Why just Virtual Machines and no containers? Suggest to make things generic and talk about virtual and physical entities.
<Nagendra> We changed this as virtual entities.
This comment applies throughout the document. [216] Ethernet OAM: Add reference. Do you refer to physical layer Ethernet OAM (802.3ah) or CFM (802.1ag)?
<Nagendra> The response was provided in the above comment section.
[243] s/uses the overlay network/uses the overlay network layer/
<Nagendra> Done.
[246] Could we add a few examples of "various overlay network technologies"? For the underlay network layer several examples are listed.
<Nagendra> Ok.
[248] What does "mostly transparent" mean?
<Nagendra> The data plane elements connecting the overlay layer nodes may not always process the overlay header.
...FB: How about we explain this in the document? [254] What does "tight coupling" between the link layer and the physical technology mean?
<Nagendra>I am not sure I understand the nit here. Do you see any difficulty in parsing the sentence?
...FB: Not sure what "tight coupling" means here. Could you clarify what is "tight coupling" vs. "not tight coupling"? [255] Suggest to avoid terms like "popular" - popularity can change, standards stay
<Nagendra> Ok. This is changed as "Ethernet is one such choice..."
[256] Acronyms "POS" and "DWDM" are not defined
<Nagendra> Added.
[274] Link start/end-points don't seem to always align with the underlay network in the diagram
<Nagendra> Fixed it.
[287] s/may comprise of/may consist of/
<Nagendra>We fixed it as "may comprise"..
[288] s/but not shown/but is not shown/
<Nagendra> We fixed this as "intermediate nodes not shown...:
[307] s/devices/device/
<Nagendra> Done.
[308] What is a "controller"?
<Nagendra> We discussed this in the above comment section.
[314] s/includes/include/
<Nagendra>Done.
[319] Add hSFC to list of acronyms in section 1.2.1
<Nagendra> This is expanded in the respective section. We added it in the acronym section as well.
[320] Add IBN to list of acronyms in section 1.2.1
<Nagendra> Ok, Done.
[325] s/includes/include/
<Nagendra> Done. [359] The function/term "controller" requires definition.
<Nagendra> Done, as mentioned in the above comment section.
[383] s/?./?/
[398] s/get the got/got/
<Nagendra> Done.
[461] s/devices/device/
<Nagendra> Done.
[469] Does it have to be equal cost multipath at the service layer, or could unequal cost multipath also be an option for load-balancing?
<Nagendra>I didn’t see any discussion specific to ECMP/UCMP in the architecture RFC.
...FB: Hmm. I did not see that RFC7665 is only about equal cost multipath. [521] Not sure whether the overlay network establishes the service plane. Isn't it that the overlay network establishes connectivity for the SFC-related functions in the service plane?
<Nagendra> The service layer is established over the overlay network layer. I am not sure if it is right to say overlay network provides connectivity for service layer (.
...FB: Overlay network is one component of the service layer, isn't it. So it is required but not sufficient. [531] s/components/component/ [545] remove "underlay"
<Nagendra> Done.
[595] s/devices/device/
<Nagendra> Done.
[600] s/action/an action/
<Nagendra> Done.
[601] Expand on "TTL or other means" (TTL also needs to be added to acronyms in 1.2.1). Is this specific to NSH? Or specific to IPv4?
<Nagendra> TTL is listed as well-known abbrev in https://www.rfc- editor.org/materials/abbrev.expansion.txt and so we left it as it is. TTL in this document refers to NSH TTL field.
...FB: Let's ensure we refer to NSH TTL in this case. Given that SFC can be done with other means than NSH, implicit reference to NSH might be a problem. [630] Mention that for "approximation of packet loss for a given SFC can be derived" to be applicable, SFC OAM packets would need to be forwarded the same as live user traffic.
<Nagendra> As it is intending to derive the approximate loss value, I am not sure if we need this additional consideration that the OAM packet would need to follow the live user traffic. Let me know if you think otherwise.
...FB: IMHO we should - given that it is one potential complication. [636] Is uppercase "MUST" applicable to an informational document? Especially given that RFC2119/RFC8174 is explicitly referenced by the draft.
<Nagendra> Based on various reviewer comments, we removed the use of any normative statement.
[666] Add MPLS, TRILL to acronyms in 1.2.1
<Nagendra> Ok. Done.
[678] s/exhaustive/exhaustive./
<Nagendra> Done.
[720] Is uppercase "SHOULD" applicable to an informational document? Especially given that RFC2119/RFC8174 is explicitly referenced by the draft.
<Nagendra> Based on various reviewer comments, we removed the use of any normative statement.
[722] Is uppercase "MAY" applicable to an informational document? Especially given that RFC2119/RFC8174 is explicitly referenced by the draft.
<Nagendra> Based on various reviewer comments, we removed the use of any normative statement.
[754] s/packet/packets/
[755] s/to next node/to the next node/
[771] How does this requirement align with the earlier paragraph, e.g. in case a node sends an ICMP reply? It would probably make sense to scope the statement to e.g. NSH.
<Nagendra> As mentioned in the statement, the node that initiates the OAM packet must set the marker and so this statement is applicable for the initiating node.
[806] s/function/functions/
<Nagendra> Done
[809] s/from relevant node/from the relevant node/
<Nagendra> Done
[810] s/generate ICMP/generate an ICMP/
<Nagendra> Done
[812] s/from last/from the last/
<Nagendra> Done
[830] s/perform continuity/perform the continuity/
<Nagendra> Done
[834] s/with relevant/with the relevant
<Nagendra> Done
[835] s/perform partial SFC availability./perform a partial SFC availability check./
<Nagendra> Done
[851] For "In-Situ OAM data fields" add a normative reference to draft-ietf-ippm-ioam-data
[905] Add "CLI" to section 1.2.1 acronyms
<Nagendra> Done
[920] Add a reference for NETCONF ->RFC6241
<Nagendra> Done
Once again, thanks a lot for the great comments.
Regards, Nagendra
Thanks again for considering the comments in great detail. Much appreciated.Cheers, Frank
|