Hi Bernard, Thank you much for the helpful comments. I've tried to cover them all. Please see MM in-line: -----Original Message----- From: Bernard Aboba via Datatracker <noreply@xxxxxxxx> Sent: Thursday, May 23, 2024 9:16 PM To: tsv-art@xxxxxxxx Cc: draft-ietf-mboned-multicast-telemetry.all@xxxxxxxx; last-call@xxxxxxxx; mboned@xxxxxxxx Subject: Tsvart last call review of draft-ietf-mboned-multicast-telemetry-09 Reviewer: Bernard Aboba Review result: Ready with Nits This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@xxxxxxxx if you reply to or forward this review. Document: draft-ietf-mboned-multicast-telemetry Reviewer: Bernard Aboba Result: Ready with Nits This document explains the deficiencies in existing OAM techniques and lays out proposals to address them. However, it doesn't provide much detail on some of the reasoning. Also in places, the wording could use improvement. Comments ---- 1. Introduction Multicast has many use cases. For example, it can be used by residential broadband customers across operator networks, private MPLS customers, and internal customers within corporate intranet. [BA] s/intranet/intranets/ MM: Done Multicast provides real time interactive online meetings or podcasts, [BA] Use of multicast for conferencing is rare nowadays. Do you really want to include this? MM: I've kept it but tried to explain it better, please see new intro down below. IPTV, and financial markets real-time data, which all have a reliance on UDP's unreliable transport. End-to-end QOS, therefore, should be a critical component of multicast deployment in order to provide a good end user experience. In multicast real-time media streaming, loss of a single packet containing a reference frame can result in the inability of thousands of receivers to decode a whole sequence of packets called Group-of-Picture, introducing black picture for periods of a few seconds. Unexpected long delay in propagation of a packet in such real-time media streaming may equally result in the packet not being received and create the same results. Multicast packet drops and delay can therefore severely affect the application performance and user experience. [BA] Suggest: MM: Done. Please see full new intro below. "In multicast real-time media streaming, if a single packet is lost within a keyframe and cannot be recovered using forward error correction, this can result in many receivers being unable to decode subsequent frames within the Group of Pictures (GoP), resulting in video freezes or black pictures until another keyframe is delivered. Unexpectedly long delays in delivery of packets can result in timeouts within similar results. Multicast packet loss and delays can therefore affect application performance and the user experience." It is important to monitor the performance of the multicast traffic. New on-path telemetry techniques such as In-situ OAM (IOAM) [RFC9197], IOAM Direct Export (DEX) [RFC9326] IOAM Marking-based Postcard (PBT-M) [I-D.song-ippm-postcard-based-telemetry], and Hybrid Two-Step (HTS) [I-D.ietf-ippm-hybrid-two-step] are useful and complementary to the existing active OAM performance monitoring methods (e.g., ICMP ping [RFC0792]), provide promising means to directly monitor the network experience of multicast traffic. However, multicast traffic has some unique characteristics which pose some challenges on applying such techniques in an efficient way. Suggest: "providing a way to monitor multicast performance. However, multicast has unique characteristics that make the efficient application of these techniques challenging." MM: Done. Here's the new intro: 1. Introduction IP Multicast has had many useful applications for several decades. [I-D.ietf-pim-multicast-lessons-learned] provides a thorough historical perspective about the design and deployment of many of the multicast routing protocols in use with the various applications. IP Multicast has been used by residential broadband customers across operator networks, private MPLS customers and internal customers within corporate intranets. IP Multicast has provided real time interactive online meetings or podcasts, IPTV, and financial markets real-time data, which all have a reliance on UDP's unreliable transport. End-to-end QOS, therefore, should be a critical component of multicast deployment in order to provide a good end user experience. In multicast real-time media streaming, if a single packet is lost within a keyframe and cannot be recovered using forward error correction, this can result in many receivers being unable to decode subsequent frames within the Group of Pictures (GoP), resulting in video freezes or black pictures until another keyframe is delivered. Unexpectedly long delays in delivery of packets can result intimeouts within similar results. Multicast packet loss and delays can therefore affect application performance and the user experience. It is important to monitor the performance of the multicast traffic. New on-path telemetry techniques such as In-situ OAM (IOAM) [RFC9197], IOAM Direct Export (DEX) [RFC9326] IOAM Marking-based Postcard (PBT-M) [I-D.song-ippm-postcard-based-telemetry], and Hybrid Two-Step (HTS) [I-D.ietf-ippm-hybrid-two-step] are useful and complementary to the existing active OAM performance monitoring methods (e.g., ICMP ping [RFC0792]), providing a way to monitor multicast performance. However, multicast has unique characteristics that make the efficient application of these techniques challenging. The IP multicast packet data for a particular (S, G) state is identical from one branch to another on its way to multiple receivers. When adding IOAM trace data to multicast packets, each replicated packet would keep the telemetry data for its entire forwarding path. Since the replicated packets all share common path segments, redundant data will be collected for the same original multicast packet. Such redundancy consumes extra network bandwidth unnecessarily. For a large multicast tree, such redundancy is considerable. Alternatively, it could be more efficient to collect the telemetry data using solutions such as IOAM DEX to eliminate the data redundancy. However, IOAM DEX lacks a branch identifier, making telemetry data correlation and multicast-tree reconstruction difficult. This draft provides two solutions to the IOAM data redundancy problem based on the IOAM standards. The requirements for multicast traffic telemetry are discussed along with the issues of the existing on-path telemetry techniques. We propose modifications to make these techniques adapt to multicast in order for the original multicast tree to be correctly reconstructed while eliminating redundant data. 2. Requirements for Multicast Traffic Telemetry Multicast traffic is forwarded through a multicast tree. With PIM and P2MP, the forwarding tree is established and maintained by the multicast routing protocol. With BIER, no state is created in the network to establish a forwarding tree; instead, a bier header provides the necessary information for each packet to know the egress points. Multicast packets are only replicated at each tree branch fork node for efficiency. There are several requirements for multicast traffic telemetry, a few of which are: * Reconstruct and visualize the multicast tree through data plane monitoring. * Gather the multicast packet delay and jitter performance on each path. * Find the multicast packet drop location and reason. * Gather the VPN state and tunnel information in case of P2MP multicast. In order to meet these requirements, we need the ability to directly monitor the multicast traffic and derive data from the multicast packets. The conventional OAM mechanisms, such as multicast ping [RFC6450] and trace [RFC8487], are not sufficient to meet these requirements. {BA] Can you provide more detail on why existing mechanisms are not sufficient? When conventional mechanisms are combined with RTCP, it seems like the first three requirements are covered. MM: Yes, here's the new last paragraph in this section. I added RTCP and basically said these telemetry solutions provide more granular networking monitoring along with less redundancy: In order to meet all of these requirements, we need the ability to directly monitor the multicast traffic and derive data from the multicast packets. The conventional OAM mechanisms, such as multicast ping [RFC6450] trace [RFC8487], and RTCP [RFC3605]are not sufficient to meet all of these requirements. The telemetry methods, in this draft, do meet these requirements by providing granular hop by hop network monitoring along with the reduction of data redundancy. 3. Issues of Existing Techniques On-path Telemetry techniques that directly retrieve data from multicast traffic's live network experience are ideal for addressing the aforementioned requirements. The representative techniques include In-situ OAM (IOAM) Trace option [RFC9197], IOAM Direct Export (DEX) option [RFC9326], and PBT-M [I-D.song-ippm-postcard-based-telemetry]. However, unlike unicast, multicast poses some unique challenges to applying these techniques. Multicast packets are replicated at each branch fork node in the corresponding multicast tree. Therefore, there are multiple copies of the original multicast packet in the network. If the IOAM trace option is used for on-path data collection, the partial trace data will also be replicated into the packet copy for each branch. The end result is that, at the multicast tree leaves, each copy of the multicast packet has a complete trace. Most of the data (except data from the last leaf branch) appear in multiple copies while only one copy is sufficient. Data redundancy introduces unnecessary header overhead, wastes network bandwidth, and complicates the data processing. The larger the multicast tree, or the longer the multicast path, the more severe the redundancy problem becomes. The postcard-based solutions (e.g., IOAM DEX), can be used to eliminate such data redundancy, because each node on the tree only sends a postcard covering local data. However, they cannot track and correlate the tree branches properly due to the lack of branching information, so they can bring confusion about the multicast tree topology. For example, in a multicast tree, Node A has two branches, one to Node B and the other to node C; further, Node B leads to Node D and Node C leads to Node E. When applying postcard-based methods, one cannot tell whether or not Node D(E) is the next hop of Node B(C) from the received postcards alone, unless one correlates the exporting nodes with knowledge about the tree collected by other means (e.g., mtrace). Such correlation is undesirable because it introduces extra work and complexity. The fundamental reason for this problem is that there is not an identifier (either implicit or explicit) to correlate the data on each branch. [BA] Can't the IP address be used as an identifier? Does the proposed solution address this issue in a simpler way? Note that "extra work" (e.g. new software) that is implemented outside network devices has an advantage over new protocols which can conceivably impact network device footprint and reliability (due to bugs). So the where and how matters with respect to "extra work and complexity". MM: Haoyu will make this clearer in the next update. Thanks again. mike -- last-call mailing list -- last-call@xxxxxxxx To unsubscribe send an email to last-call-leave@xxxxxxxx