[Last-Call] Re: Tsvart last call review of draft-ietf-mboned-multicast-telemetry-09

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bernard,

Thank you much for the helpful comments. I've tried to cover them all. Please see MM  in-line:

-----Original Message-----
From: Bernard Aboba via Datatracker <noreply@xxxxxxxx> 
Sent: Thursday, May 23, 2024 9:16 PM
To: tsv-art@xxxxxxxx
Cc: draft-ietf-mboned-multicast-telemetry.all@xxxxxxxx; last-call@xxxxxxxx; mboned@xxxxxxxx
Subject: Tsvart last call review of draft-ietf-mboned-multicast-telemetry-09

Reviewer: Bernard Aboba
Review result: Ready with Nits

This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information.

When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@xxxxxxxx if you reply to or forward this review.

Document: draft-ietf-mboned-multicast-telemetry
Reviewer: Bernard Aboba
Result: Ready with Nits

This document explains the deficiencies in existing OAM techniques and lays out proposals to address them.  However, it doesn't provide much detail on some of the reasoning.  Also in places, the wording could use improvement.

Comments
----

1.  Introduction

   Multicast has many use cases.  For example, it can be used by
   residential broadband customers across operator networks, private
   MPLS customers, and internal customers within corporate intranet.

[BA] s/intranet/intranets/

MM: Done

   Multicast provides real time interactive online meetings or podcasts,

[BA] Use of multicast for conferencing is rare nowadays. Do you really
   want to include this?

MM: I've kept it but tried to explain it better, please see new intro down below.

   IPTV, and financial markets real-time data, which all have a reliance
   on UDP's unreliable transport.  End-to-end QOS, therefore, should be
   a critical component of multicast deployment in order to provide a
   good end user experience.  In multicast real-time media streaming,
   loss of a single packet containing a reference frame can result in
   the inability of thousands of receivers to decode a whole sequence of
   packets called Group-of-Picture, introducing black picture for
   periods of a few seconds.

   Unexpected long delay in propagation of a
   packet in such real-time media streaming may equally result in the
   packet not being received and create the same results.  Multicast
   packet drops and delay can therefore severely affect the application
   performance and user experience.

[BA] Suggest:

MM: Done. Please see full new intro below.

   "In multicast real-time media streaming, if a single packet is lost
   within a keyframe and cannot be recovered using forward
   error correction, this can result in many receivers being unable
   to decode subsequent frames within the Group of Pictures (GoP), resulting
   in video freezes or black pictures until another keyframe is
   delivered.

   Unexpectedly long delays in delivery of packets can result in
   timeouts within similar results. Multicast packet loss and
   delays can therefore affect application performance and the
   user experience."

   It is important to monitor the performance of the multicast traffic.
   New on-path telemetry techniques such as In-situ OAM (IOAM)
   [RFC9197], IOAM Direct Export (DEX) [RFC9326] IOAM Marking-based
   Postcard (PBT-M) [I-D.song-ippm-postcard-based-telemetry], and Hybrid
   Two-Step (HTS) [I-D.ietf-ippm-hybrid-two-step] are useful and
   complementary to the existing active OAM performance monitoring
   methods (e.g., ICMP ping [RFC0792]),

   provide promising means to
   directly monitor the network experience of multicast traffic.
   However, multicast traffic has some unique characteristics which pose
   some challenges on applying such techniques in an efficient way.

Suggest:
   "providing a way to monitor multicast performance. However, multicast
   has unique characteristics that make the efficient application
   of these techniques challenging."

MM: Done. Here's the new intro:

1.  Introduction

   IP Multicast has had many useful applications for several decades.
   [I-D.ietf-pim-multicast-lessons-learned] provides a thorough
   historical perspective about the design and deployment of many of the
   multicast routing protocols in use with the various applications.  IP
   Multicast has been used by residential broadband customers across
   operator networks, private MPLS customers and internal customers
   within corporate intranets.  IP Multicast has provided real time
   interactive online meetings or podcasts, IPTV, and financial markets
   real-time data, which all have a reliance on UDP's unreliable
   transport.  End-to-end QOS, therefore, should be a critical component
   of multicast deployment in order to provide a good end user
   experience.  In multicast real-time media streaming, if a single
   packet is lost within a keyframe and cannot be recovered using
   forward error correction, this can result in many receivers being
   unable to decode subsequent frames within the Group of Pictures
   (GoP), resulting in video freezes or black pictures until another
   keyframe is delivered.  Unexpectedly long delays in delivery of
   packets can result intimeouts within similar results.  Multicast
   packet loss and delays can therefore affect application performance
   and the user experience.

   It is important to monitor the performance of the multicast traffic.
   New on-path telemetry techniques such as In-situ OAM (IOAM)
   [RFC9197], IOAM Direct Export (DEX) [RFC9326] IOAM Marking-based
   Postcard (PBT-M) [I-D.song-ippm-postcard-based-telemetry], and Hybrid
   Two-Step (HTS) [I-D.ietf-ippm-hybrid-two-step] are useful and
   complementary to the existing active OAM performance monitoring
   methods (e.g., ICMP ping [RFC0792]), providing a way to monitor
   multicast performance.  However, multicast has unique characteristics
   that make the efficient application of these techniques challenging.

   The IP multicast packet data for a particular (S, G) state is
   identical from one branch to another on its way to multiple
   receivers.  When adding IOAM trace data to multicast packets, each
   replicated packet would keep the telemetry data for its entire
   forwarding path.  Since the replicated packets all share common path
   segments, redundant data will be collected for the same original
   multicast packet.  Such redundancy consumes extra network bandwidth
   unnecessarily.  For a large multicast tree, such redundancy is
   considerable.  Alternatively, it could be more efficient to collect
   the telemetry data using solutions such as IOAM DEX to eliminate the
   data redundancy.  However, IOAM DEX lacks a branch identifier, making
   telemetry data correlation and multicast-tree reconstruction
   difficult.

   This draft provides two solutions to the IOAM data redundancy problem
   based on the IOAM standards.  The requirements for multicast traffic
   telemetry are discussed along with the issues of the existing on-path
   telemetry techniques.  We propose modifications to make these
   techniques adapt to multicast in order for the original multicast
   tree to be correctly reconstructed while eliminating redundant data.

2.  Requirements for Multicast Traffic Telemetry

   Multicast traffic is forwarded through a multicast tree.  With PIM
   and P2MP, the forwarding tree is established and maintained by the
   multicast routing protocol.  With BIER, no state is created in the
   network to establish a forwarding tree; instead, a bier header
   provides the necessary information for each packet to know the egress
   points.  Multicast packets are only replicated at each tree branch
   fork node for efficiency.

   There are several requirements for multicast traffic telemetry, a few
   of which are:

   *  Reconstruct and visualize the multicast tree through data plane
      monitoring.

   *  Gather the multicast packet delay and jitter performance on each
      path.

   *  Find the multicast packet drop location and reason.

   *  Gather the VPN state and tunnel information in case of P2MP
      multicast.

   In order to meet these requirements, we need the ability to directly
   monitor the multicast traffic and derive data from the multicast
   packets.  The conventional OAM mechanisms, such as multicast ping
   [RFC6450] and trace [RFC8487], are not sufficient to meet these
   requirements.

{BA] Can you provide more detail on why existing mechanisms are not sufficient?
When conventional mechanisms are combined with RTCP, it seems like the first three requirements are covered.

MM: Yes, here's the new last paragraph in this section. I added RTCP and basically said these telemetry solutions provide more granular networking monitoring along with less redundancy:

   In order to meet all of these requirements, we need the ability to
   directly monitor the multicast traffic and derive data from the
   multicast packets.  The conventional OAM mechanisms, such as
   multicast ping [RFC6450] trace [RFC8487], and RTCP [RFC3605]are not
   sufficient to meet all of these requirements.  The telemetry methods,
   in this draft, do meet these requirements by providing granular
   hop by hop network monitoring along with the reduction of data
   redundancy.

3.  Issues of Existing Techniques

   On-path Telemetry techniques that directly retrieve data from
   multicast traffic's live network experience are ideal for addressing
   the aforementioned requirements.  The representative techniques
   include In-situ OAM (IOAM) Trace option [RFC9197], IOAM Direct Export
   (DEX) option [RFC9326], and PBT-M
   [I-D.song-ippm-postcard-based-telemetry].  However, unlike unicast,
   multicast poses some unique challenges to applying these techniques.

   Multicast packets are replicated at each branch fork node in the
   corresponding multicast tree.  Therefore, there are multiple copies
   of the original multicast packet in the network.

   If the IOAM trace option is used for on-path data collection, the
   partial trace data will also be replicated into the packet copy for
   each branch.  The end result is that, at the multicast tree leaves,
   each copy of the multicast packet has a complete trace.  Most of the
   data (except data from the last leaf branch) appear in multiple
   copies while only one copy is sufficient.  Data redundancy introduces
   unnecessary header overhead, wastes network bandwidth, and
   complicates the data processing.  The larger the multicast tree, or
   the longer the multicast path, the more severe the redundancy problem
   becomes.

   The postcard-based solutions (e.g., IOAM DEX), can be used to
   eliminate such data redundancy, because each node on the tree only
   sends a postcard covering local data.  However, they cannot track and
   correlate the tree branches properly due to the lack of branching
   information, so they can bring confusion about the multicast tree
   topology.  For example, in a multicast tree, Node A has two branches,
   one to Node B and the other to node C; further, Node B leads to Node
   D and Node C leads to Node E.  When applying postcard-based methods,
   one cannot tell whether or not Node D(E) is the next hop of Node B(C)
   from the received postcards alone, unless one correlates the
   exporting nodes with knowledge about the tree collected by other
   means (e.g., mtrace).  Such correlation is undesirable because it
   introduces extra work and complexity.

   The fundamental reason for this problem is that there is not an
   identifier (either implicit or explicit) to correlate the data on
   each branch.

[BA] Can't the IP address be used as an identifier?  Does the proposed solution address this issue in a simpler way? Note that "extra work"
(e.g. new software) that is implemented outside network devices has an advantage over new protocols which can conceivably impact network device footprint and reliability (due to bugs). So the where and how matters with respect to "extra work and complexity".

MM: Haoyu will make this clearer in the next update.

Thanks again.
mike



-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux