Re: [Last-Call] Tsvart telechat review of draft-ietf-sfc-oam-framework-13

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So the question now is whether the text Murray suggested suffices for you? (We are still waiting to hear from Alvaro.)

Yours,
Joel

On 5/20/2020 1:41 PM, Frank Brockners (fbrockne) wrote:
Thanks Joel. Per what I mentioned below, let's be clear that SF performance is out of scope for the doc.
And I think this was Alvaro's point as well.

Cheers, Frank

-----Original Message-----
From: Joel M. Halpern <jmh@xxxxxxxxxxxxxxx>
Sent: Mittwoch, 20. Mai 2020 19:21
To: Frank Brockners (fbrockne) <fbrockne@xxxxxxxxx>; Nagendra Kumar Nainar
(naikumar) <naikumar@xxxxxxxxx>; tsv-art@xxxxxxxx
Cc: sfc@xxxxxxxx; last-call@xxxxxxxx; draft-ietf-sfc-oam-framework.all@xxxxxxxx
Subject: Re: Tsvart telechat review of draft-ietf-sfc-oam-framework-13

Frank, regarding your comment about SF performance, I thought the document
was pretty clear that we consider that out of scope (c.f. the discussions with the
various ADs.)

If you can see a place to add text, please propose text.

Thank you,
Joel

On 5/20/2020 1:10 PM, Frank Brockners (fbrockne) wrote:
Hi Nagendra,

Thanks for the detailed reply. Please see inline (..FB).

-----Original Message-----
From: Nagendra Kumar Nainar (naikumar) <naikumar@xxxxxxxxx>
Sent: Samstag, 16. Mai 2020 16:16
To: Frank Brockners (fbrockne) <fbrockne@xxxxxxxxx>; tsv-art@xxxxxxxx
Cc: sfc@xxxxxxxx; last-call@xxxxxxxx;
draft-ietf-sfc-oam-framework.all@xxxxxxxx
Subject: Re: Tsvart telechat review of
draft-ietf-sfc-oam-framework-13

Hi Frank,

Thank you for the review. Please see inline for the response..


      Reviewer: Frank Brockners
      Review result: Ready with Nits

      This document has been reviewed as part of the transport area review
team's
      ongoing effort to review key IETF documents. These comments were
written
      primarily for the transport area directors, but are copied to the
document's
      authors and WG to allow them to address any issues raised and
also to the IETF
      discussion list for information.

      When done at the time of IETF Last Call, the authors should consider this
      review as part of the last-call comments they receive. Please always CC
      tsv-art@xxxxxxxx if you reply to or forward this review.

      This document provides a reference framework for OAM for SFC.

      Comments:

      Section 3.1.1 SF availability: The text makes explicit reference to multiple
      instances of a SF. Consequently, it should be defined how availability of a
SF
      is computed/determined in case multiple instances are deployed.

<Nagendra> This is already clarified in the section as below:

"For cases where
     multiple instances of an SF are used to realize a given SF for the
     purpose of load sharing, SF availability can be performed by checking
     the availability of any one of those instances, or the availability
     check may be targeted at a specific instance."

This further
      leads to the question, whether availability is always a "binary" state
      (available / not-available), or could a SF be e.g. 99% available?

<Nagendra>The availability is measured as binary state. I am not sure
what is 99% available. If it means getting 99 responses for 100
probes sent, I think it falls under packet loss category which in turn is
performance measurement.

...FB: Thanks. Though I'm still not entirely following. If availability is binary and
I put the statements above together, what would be the availability of the
following setup: There is an SF that is made up of 100 instances. 99 of these
instances are powered down entirely. And the 1 instance that is "up" is
alternating between servicing requests for 10min followed by not servicing
requests for 10min. Would the SF be considered "available"?


Section 3.1.2
      SF performance: What is the impact of a "multiple instance SF
deployment" on SF
      performance measurement?

<Nagendra>I think we covered this in SF availability but not here.
Does the below updated text look better?

OLD:
On the one hand, the performance of any specific SF can be quantified
     by measuring the loss and delay metrics of the traffic from SFF to
     the respective SF, while on the other hand, the performance can be
     measured by leveraging the loss and delay metrics from the respective
     SFs.  The latter requires SF involvement to perform the measurement
     while the former does not.

NEW:
On the one hand, the performance of any specific SF can be quantified
     by measuring the loss and delay metrics of the traffic from SFF to
     the respective SF, while on the other hand, the performance can be
     measured by leveraging the loss and delay metrics from the respective
     SFs.  The latter requires SF involvement to perform the measurement
     while the former does not. For cases where
     multiple instances of an SF are used to realize a given SF for the
     purpose of load sharing, SF performance can be quantified by measuring
     the metrics for any one instance of SF or by measuring the metrics for
     a specific instance.

The section only talks about loss and delay as
      performance criteria. It would be good to state that other
performance criteria
      (e.g. specific to the SF, throughput, etc.) exist.

<Nagendra> We can add the below to Section 3.1.2:

NEW:
"The metrics measured to quantify the performance of the SF component
is not just limited to loss and delay. Other metrics such as
throughout also exist and the choice of metrics for performance
measurement is outside the scope of this document."

Section 3.2.1 SFC
      availability: The current definition is very focused on connectivity
      verification, i.e. it tries to answer the question: "Does my SFC transport
      packets?". IMHO we should also ask the question "Does my SFC process
the
      packets correctly?" - because if packets are not processed per the SFC
      definition, we might not call the SFC available.

<Nagendra> I think this is already handled by SF availability. The
end-to-end SFC availability is verified by steering the OAM packet
over the ordered set of SFs within the SFC. This is more like daisy
chaining the availability of SFs within the SFC to determine
end-to-end SFC availability. If the derived solution verifies the SF
availability not just based on the uptime but based on the service
treatment, it also answers the question "Does my SFC process the packets
correctly". Let us know if there is any further clarity required.

While 3.2.2 states that "any
      SFC-aware network device should have the ability to make performance
      measurements" a similar statement isn't found in 3.2.1. IMHO the ability
for
      availability checks is probably a prerequisite for performance
measurement.

<Nagendra> The ability to perform end-to-end or partial SFC
availability verification is already mentioned in section 3.2.1 as below:

" In order to perform service connectivity verification of an SFC/SFP,
     the OAM functions could be initiated from any SFC-aware network
     devices of an SFC-enabled domain for end-to-end paths, or partial
     paths terminating on a specific SF, within the SFC/SFP"

Please let us know if you have any suggestion to improve if there is
a lack of clarity.

      Section 3.2.2 SFC performance measurement: The section only
mentions the need
      for performance measurement. It misses the definition of what
SFC performance
      measurement is.

<Nagendra>

...FB: Thanks for the suggested updates, which would definitively improve the
text. One problem about SFC performance remains though IMHO.
All the text so far is focused on the connectivity within a SFC - not the service
itself. I.e. If you'd consider a "laundry service" - we focus a lot on how long it
takes to get the clothes shipped to and from the washing machine, but we don't
focus on how well the washing machine washes the clothes.
IMHO we should either expand on the performance of the SFC and SF wrt/ the
service (especially given that you define a service layer in section 2) - or clearly
state that the framework would just focus on connectivity between SFs.



Section 3.3. Classifier component: The section mentions the
      need for the ability to perform performance measurement of the classifier
      component. What is performance measurement of the classifier? What
does
      performance measurement of the classifier component comprise?

<Nagendra>We can add the below text:

OLD:
Any SFC-aware network device should have the ability to perform
     performance measurement of the classifier component for each SFC.

NEW:
Any SFC-aware network device should have the ability to perform
     performance measurement of the classifier component for each SFC.
      The performance can be quantified by measuring the performance
metrics of the
       traffic from the classifier for each SFC/SFP.

Section 3.4. /
      3.5. Availability/PM of the underlay and overlay network: It would be good
to
      add a sentence that states that the mechanisms for availability/PM which
are
      offered by the technologies used by the overlay/underlay are
used, rather than
      new methods specifically for SFC would be defined.

<Nagendra>Yes, that makes sense. Please check the below text:

OLD:
Any SFC-aware network device may have the ability to perform
     availability check or performance measurement of the overlay network.

NEW:
Any SFC-aware network device may have the ability to perform
     availability check or performance measurement of the overlay network.
Any
     existing OAM tools and techniques can be leveraged for this purpose.

Section 4. SFC OAM
      Functions: It would be good, if examples in section 4 could also include
more
      "recent" methods such as OWAMP/TWAMP (RFC4656, RFC 5357).

<Nagendra>

OLD:
Delay within an SFC could be measured based on the time it takes for
     a packet to traverse the SFC from the ingress SFC node to the egress
     SFF.  As SFCs are unidirectional in nature, measurement of one-way
     delay [RFC7679] is important.  In order to measure one-way delay,
     time synchronization MUST be supported by means such as NTP, PTP,
     GPS, etc.

NEW:
Delay within an SFC could be measured based on the time it takes for
     a packet to traverse the SFC from the ingress SFC node to the egress
     SFF.  Measurement protocols such as One-way Active Measurement
      Protocol (OWAMP) [RFC4656], Two-way Active Measurement Protocol
     (TWAMP) [RFC5357] can be used to measure the characteristics. As
     SFCs are unidirectional in nature, measurement of one-way
     delay [RFC7679] is important.  In order to measure one-way delay,
     time synchronization MUST be supported by means such as NTP,
Precision Time Protocol (PTP),
     GPS, etc.

Section 4.4.
      Performance Measurement: Focus is entirely on the PM of the
connectivity,
      rather than on the SF. How about covering PM for the SF as well?

<Nagendra> I am not sure I understand what is missing. Do you have
any suggestion for the text improvement?.

...FB: See above. This would be about adding a capability to assess how well
the washing machine washes my laundry.


Section 5.1
      OAM Tool Gap Analysis:
       - Not sure what "NVo3 OAM" refers to. Could that be explained
below the table
       and in section 1.2.1?

<Nagendra> Combining this with other below queries as they appears to
be related.

- E-OAM needs to be detailed. Is seems that CFM
       (802.1ag) and not 802.3ah is refered to here.

<Nagendra> Per my understanding, 802.ah is 1-hop while 802.3ag can be
more than 1 hop and both uses Ethernet frames. So I think both are
applicable here.
My response regarding E-OAM details in this section is combined below.

...FB: Maybe I missed it - but I don't see text that refers to CFM or EFM OAM.
Where is this covered? IMHO we would need references to IEEE standards to
avoid confusion.


- "Trace" in the "Trace" column
       need to be extended on. Is this traceroute? Paris-Traceroute?
IOAM- Loopback?

       IPPM needs to be detailed, because IPPM is not a tool as such
but an IETF WG.
       Does this refer to OWAMP/TWAMP/etc. as defined by IPPM?

<Nagendra> Combining the above queries.

OLD:
There are various OAM tool sets available to perform OAM functions
     within various layers.  These OAM functions may be used to validate
     some of the underlay and overlay networks.  Tools like ping and trace
     are in existence to perform connectivity check and tracing of
     intermediate hops in a network.  These tools support different
     network types like IP, MPLS, TRILL, etc.  There is also an effort to
     extend the tool set to provide connectivity and continuity checks
     within overlay networks.  BFD is another tool which helps in
     detecting data forwarding failures.  Table 3 below is not
exhaustive

NEW:
There are various OAM tool sets available to perform OAM functions
     within various layers.  These OAM functions may be used to validate
     some of the underlay and overlay networks.  Tools like ping and trace
     are used to perform connectivity check and tracing of
     intermediate hops in a network.  These tools are already available for
     different types of networks such as IP, MPLS, TRILL, etc.

E-OAM offers OAM mechanisms such as an Ethernet continuity check for
Ethernet links. There is an effort around NVO3 OAM to provide
connectivity and continuity checks for networks that use NVO3.  BFD
is used for the detection of data plane forwarding failures.

...FB: Check whether NVO3 WG will indeed deliver a solution and "NVO3 OAM"
indeed existis. If in doubt, it might be better to avoid forward looking
references. Per my note above, it would be good to explicitly refer to IEEE
standards as opposed to introducing a new term like "E-OAM".


The IPPM framework [RFC 2330] offers tools such as OWAMP [RFC4656]
and TWAMP [RFC5357] (collectively referred as IPPM in this section)
to measure various performance metrics. MPLS Packet Loss Measurement
(LM) and Packet Delay Measurement (DM) (collectively referred as
MPLS_PM in this section) [RFC6374] offers the ability to measure
performance metrics in MPLS network.

Table 3 below is not exhaustive.

Section 6.4.3 IOAM:
      - The section states that IOAM "may be used to perform various SFC OAM
      functions as well". It would be good to expand on this statement: E.g.
IOAM
      Trace-Option Type could be leveraged for SFC tracing. IOAM
Direct-Export Option
      Type could be leveraged. - How would we deal with the IOAM Active Flag
      (draft-ietf-ippm-ioam-flags-01) when used with SFC OAM?

<Nagendra> The intention of the section is to highlight the
applicability of different OAM toolsets for OAM functions at service
layer. I am not sure if we really should try explaining all the
possible options within each tool. But I agree that it is worth
clarifying the availability of IOAM options for tracing. think we can
clarify that different IOAM Option-Types are available for OAM functions
such as SFC tracing. Can you check if the below looks ok?

OLD:
[I-D.ietf-sfc-ioam-nsh] defines how In-Situ OAM data fields are
     transported using NSH header.  [I-D.ietf-sfc-proof-of-transit]
     defines a mechanism to perform proof of transit to securely verify if
     a packet traversed the relevant SFP or SFC.  While the mechanism is
     defined inband (i.e., it will be included in data packets), it may be
     used to perform various SFC OAM functions as well.

NEW:
[I-D.ietf-sfc-ioam-nsh] defines how In-Situ OAM data fields are
     transported using NSH header.  [I-D.ietf-sfc-proof-of-transit]
     defines a mechanism to perform proof of transit to securely verify if
     a packet traversed the relevant SFP or SFC.  While the mechanism is
     defined inband (i.e., it will be included in data packets), IOAM Option-Types
    such as IOAM Trace Option-Types can also be used to perform other
SFC OAM function
    such as SFC tracing.

- The text states
      "In-Situ OAM could be used with O bit set": Why would IOAM be used with
the
      overflow bit set for SFC OAM? For details on IOAM's O-bit, see section
4.4.1 in
      https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09.

<Nagendra> The O bit referred here is not the O bit in IOAM but the
one in NSH/Overlay header. To avoid any confusion, this can be updated as
below:

OLD:
In-Situ OAM could be used with O bit set to perform SF availability
     and SFC availability or performance measurement.

NEW:
In-Situ OAM could be used with O bit in the overlay header set, to
perform SF availability
     and SFC availability or performance measurement.

... FB: Ah, ok. Given that this section is about IOAM and not NSH, I'd rather
explicitly refer to NSH here. E.g. If SFC is realized using NSH, then the O-bit in the
NSH header could be used to indicated OAM traffic. You could refer to
https://tools.ietf.org/html/draft-ietf-sfc-ioam-nsh-03#section-4.2 explicitly.


Section 6.4.4 SFC
      Traceroute: - This section refers to an expired draft (even calling out the
      fact that the draft has exipred), but also mentions that functionality is
      available and implemented in OpenDaylight. Consider removing the
references to
      the expired draft and rather add references to OpenDaylight
documents. - IOAM
      Loopback (see draft-ietf-ippm-ioam-flags-01) could apply SFC
Traceroute as well.

<Nagendra>Ok. Let me check if I can find some reference for ODL.

      Detailed set of nits that I encountered while reading through
the document ([x]
      references line number x) – hope that they are helpful in further improving
the
      doc:

<Nagendra> Yes of course (.

      [global] s/an SF/a SF/ -- and similarly SFC/SFF

<Nagendra>Other RFCs uses "an SF/SFF". So the draft is updated
accordingly. If your suggestion is to substitute "a SF" to "an SF",  it is done (.

      [176] "OAM Controller" not defined

<Nagendra>We can change it as below:

OLD:
OAM controllers are assumed to be within the same administrative
     domain as the target SFC enabled domain.

NEW:
OAM controllers are SFC-aware network devices that are capable of
generating OAM packets. They are assumed to be within the same
administrative domain as the target SFC enabled domain.

      [202] Why just Virtual Machines and no containers? Suggest to make
things
      generic and talk about virtual and physical entities.

<Nagendra> We changed this as virtual entities.

            This comment applies throughout the document.
      [216] Ethernet OAM: Add reference. Do you refer to physical
layer Ethernet OAM
      (802.3ah) or CFM (802.1ag)?

<Nagendra> The response was provided in the above comment section.

[243] s/uses the overlay network/uses the overlay
      network layer/

<Nagendra> Done.

[246] Could we add a few examples of "various overlay network
      technologies"? For the underlay network layer several examples are listed.

<Nagendra> Ok.

      [248] What does "mostly transparent" mean?

<Nagendra> The data plane elements connecting the overlay layer nodes
may not always process the overlay header.

...FB: How about we explain this in the document?


[254] What does "tight coupling"
      between the link layer and the physical technology mean?

<Nagendra>I am not sure I understand the nit here. Do you see any
difficulty in parsing the sentence?

...FB: Not sure what "tight coupling" means here. Could you clarify what is
"tight coupling" vs. "not tight coupling"?


[255] Suggest to avoid
      terms like "popular" - popularity can change, standards stay

<Nagendra> Ok. This is changed as "Ethernet is one such choice..."

[256] Acronyms
      "POS" and "DWDM" are not defined

<Nagendra> Added.

[274] Link start/end-points don't seem to
      always align with the underlay network in the diagram

<Nagendra> Fixed it.

[287] s/may comprise
      of/may consist of/

<Nagendra>We fixed it as "may comprise"..

[288] s/but not shown/but is not shown/

<Nagendra> We fixed this as "intermediate nodes not shown...:

[307]
      s/devices/device/

<Nagendra> Done.

[308] What is a "controller"?

<Nagendra> We discussed this in the above comment section.

[314] s/includes/include/

<Nagendra>Done.

[319]
      Add hSFC to list of acronyms in section 1.2.1

<Nagendra> This is expanded in the respective section. We added it in
the acronym section as well.

[320] Add IBN to list of acronyms
      in section 1.2.1

<Nagendra> Ok, Done.

[325] s/includes/include/

<Nagendra> Done.
[359] The function/term "controller"
      requires definition.

<Nagendra> Done, as mentioned in the above comment section.

[383] s/?./?/

[398] s/get the got/got/

<Nagendra> Done.

   [461]
      s/devices/device/

<Nagendra> Done.

   [469] Does it have to be equal cost multipath at the service
      layer, or could unequal cost multipath also be an option for load-
balancing?

<Nagendra>I didn’t see any discussion specific to ECMP/UCMP in the
architecture RFC.

...FB: Hmm. I did not see that RFC7665 is only about equal cost multipath.

   [521] Not sure whether the overlay network establishes the service plane.
Isn't
      it that the overlay network establishes connectivity for the SFC-related
      functions in the service plane?

<Nagendra> The service layer is established over the overlay network
layer. I am not sure if it is right to say overlay network provides
connectivity for service layer (.

...FB: Overlay network is one component of the service layer, isn't it. So it is
required but not sufficient.


[531] s/components/component/ [545] remove
      "underlay"

<Nagendra> Done.

[595] s/devices/device/

<Nagendra> Done.

[600] s/action/an action/

<Nagendra> Done.

[601] Expand on
      "TTL or other means" (TTL also needs to be added to acronyms in 1.2.1). Is
this
      specific to NSH? Or specific to IPv4?

<Nagendra> TTL is listed as well-known abbrev in https://www.rfc-
editor.org/materials/abbrev.expansion.txt and so we left it as it is.
TTL in this document refers to NSH TTL field.

...FB: Let's ensure we refer to NSH TTL in this case. Given that SFC can be done
with other means than NSH, implicit reference to NSH might be a problem.

   [630] Mention that for "approximation of
      packet loss for a given SFC can be derived" to be applicable, SFC OAM
packets
      would need to be forwarded the same as live user traffic.

<Nagendra> As it is intending to derive the approximate loss value, I
am not sure if we need this additional consideration that the OAM
packet would need to follow the live user traffic. Let me know if you think
otherwise.

...FB: IMHO we should - given that it is one potential complication.


   [636] Is uppercase
      "MUST" applicable to an informational document? Especially given that
      RFC2119/RFC8174 is explicitly referenced by the draft.

<Nagendra> Based on various reviewer comments, we removed the use of
any normative statement.

[666] Add MPLS, TRILL to
      acronyms in 1.2.1

<Nagendra> Ok. Done.

[678] s/exhaustive/exhaustive./

<Nagendra> Done.

[720] Is uppercase "SHOULD" applicable to an informational document?
      Especially given that RFC2119/RFC8174 is explicitly referenced by the
draft.

<Nagendra> Based on various reviewer comments, we removed the use of
any normative statement.

[722] Is uppercase "MAY" applicable to an informational document?
Especially
      given that RFC2119/RFC8174 is explicitly referenced by the draft.

<Nagendra> Based on various reviewer comments, we removed the use of
any normative statement.

[754]
      s/packet/packets/

[755] s/to next node/to the next node/

   [771] How does this
      requirement align with the earlier paragraph, e.g. in case a
node sends an ICMP
      reply? It would probably make sense to scope the statement to e.g. NSH.

<Nagendra> As mentioned in the statement, the node that initiates the
OAM packet must set the marker and so this statement is applicable
for the initiating node.

[806]
      s/function/functions/

<Nagendra> Done

[809] s/from relevant node/from the relevant node/

<Nagendra> Done

[810]
      s/generate ICMP/generate an ICMP/

<Nagendra> Done

[812] s/from last/from the last/

<Nagendra> Done

[830]
      s/perform continuity/perform the continuity/

<Nagendra> Done

   [834] s/with relevant/with the
      relevant

<Nagendra> Done

[835] s/perform partial SFC availability./perform a partial SFC
      availability check./

<Nagendra> Done

[851] For "In-Situ OAM data fields" add a normative
      reference to draft-ietf-ippm-ioam-data

[905] Add "CLI" to section 1.2.1
      acronyms

<Nagendra> Done

[920] Add a reference for NETCONF ->RFC6241

<Nagendra> Done

Once again, thanks a lot for the great comments.

Regards,
Nagendra

Thanks again for considering the comments in great detail. Much appreciated.

Cheers, Frank





--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux