Reviewer: Matt Joras Review result: Ready with Nits I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last call comments. For more information, please see the FAQ at <https://trac.ietf.org/trac/gen/wiki/GenArtfaq>. Document: draft-ietf-bess-evpn-igmp-mld-proxy-?? Reviewer: Matt Joras Review Date: 2021-08-26 IETF LC End Date: 2021-09-07 IESG Telechat date: Not scheduled for a telechat Review Ethernet Virtual Private Network (EVPN) solution is becoming pervasive in data center (DC) applications for Network Virtualization Overlay (NVO) and DC interconnect (DCI) services, and in service provider (SP) applications for next generation virtual private LAN services. This intro to the abstract could use some rewording. For example, "is becoming" does not read well in a standards document. It is sufficient to describe what this document is specifying. Also "next generation" is overused and doesn't usually read well retrospectively. This draft describes how to support efficiently endpoints running IGMP for the above services over an EVPN network by incorporating IGMP proxy procedures on EVPN PEs. Avoid using the term "draft" as this will have to be edited out since the idea is for this to be standards track. 1. Introduction Ethernet Virtual Private Network (EVPN) solution [RFC7432] is becoming pervasive in data center (DC) applications for Network Virtualization Overlay (NVO) and DC interconnect (DCI) services, and in service provider (SP) applications for next generation virtual private LAN services. This is a copy of the abstract, and has similar issues. The introduction serves a different purpose beyond the abstract, it also has the same grammatical issues as the abstract. In DC applications, a point of delivery (POD) can consist of a collection of servers supported by several top of rack (TOR) and Spine switches. This collection of servers and switches are self contained and may have their own control protocol for intra-POD communication and orchestration. However, EVPN is used as standard way of inter-POD communication for both intra-DC and inter-DC. A subnet can span across multiple PODs and DCs. EVPN provides robust multi-tenant solution with extensive multi-homing capabilities to stretch a subnet (VLAN) across multiple PODs and DCs. There can be many hosts/VMs(virtual machine) (several hundreds) attached to a subnet that is stretched across several PODs and DCs. Why is "Spine" capitalized? I'm also wondering whether another term might be appropriate here that doesn't imply a certain topology. Also, "provides robust multi-tenant solution" should probably be "provides a robust multi-tenant solution". These hosts/VMs express their interests in multicast groups on a given subnet/VLAN by sending IGMP Membership Reports (Joins) for their interested multicast group(s). Furthermore, an IGMP router periodically sends membership queries to find out if there are hosts on that subnet that are still interested in receiving multicast traffic for that group. The IGMP/MLD Proxy solution described in this draft accomplishes three objectives: I don't think you need "/VMs". They are a kind of host. There is also another use of "draft" in this paragraph. 3. Selective Multicast: to forward multicast traffic over EVPN network such that it only gets forwarded to the PEs that have interest in the multicast group(s), multicast traffic will not be forwarded to the PEs that have no receivers attached to them for that multicast group. This draft shows how this objective may be achieved when Ingress Replication is used to distribute the multicast traffic among the PEs. Procedures for supporting selective multicast using P2MP tunnels can be found in [I-D.ietf-bess-evpn-bum-procedure-updates] The first sentence is very long and could probably be reworded to be less redundant. Also there is another instance of "draft". The first two objectives are achieved by using IGMP/MLD proxy on the PE and the third objective is achieved by setting up a multicast tunnel (e.g., ingress replication) only among the PEs that have interest in that multicast group(s) based on the trigger from IGMP/ MLD proxy processes. The proposed solutions for each of these objectives are discussed in the following sections. The first sentence can probably be split into two. Also, is "(e.g., ingress replication)" really an example? "multicast tunnel" probably suffices. o Ethernet Segment (ES): When a customer site (device or network) is connected to one or more PEs via a set of Ethernet links, then that set of links is referred to as an 'Ethernet Segment'. o Ethernet Segment Identifier (ESI): A unique non-zero identifier that identifies an Ethernet Segment is called an 'Ethernet Segment Identifier'. Both of these terminology definitions can drop the end part where they quote the thing they're defining. It is implied by the colon. o Ethernet Tag: An Ethernet tag identifies a particular broadcast domain, e.g., a VLAN. An EVPN instance consists of one or more broadcast domains. Same issue here more or less. You don't need to start out a sentence saying "Ethernet tag" again. 4.1. Proxy Reporting When IGMP protocol is used between hosts/VMs and their first hop EVPN router (EVPN PE), Proxy-reporting is used by the EVPN PE to summarize (when possible) reports received from downstream hosts and propagate them in BGP to other PEs that are interested in the information. This is done by terminating the IGMP Reports in the first hop PE, and translating and exchanging the relevant information among EVPN BGP speakers. The information is again translated back to IGMP message at the recipient EVPN speaker. Thus it helps create an IGMP overlay subnet using BGP. In order to facilitate such an overlay, this document also defines a new EVPN route type NLRI, the EVPN Selective Multicast Ethernet Tag route, along with its procedures to help exchange and register IGMP multicast groups Section 9. Another usage of hosts/VMs. 1. When the first hop PE receives several IGMP Membership Reports (Joins), belonging to the same IGMP version, from different attached hosts/VMs for the same (*,G) or (S,G), it SHOULD send a single BGP message corresponding to the very first IGMP Membership Request (BGP update as soon as possible) for that (*,G) or (S,G). This is because BGP is a stateful protocol and no further transmission of the same report is needed. If the IGMP Membership Request is for (*,G), then multicast group address MUST be sent along with the corresponding version flag (v2 or v3) set. In case of IGMPv3, the exclude flag MUST also be set to indicate that no source IP address must be excluded (include all sources"*"). If the IGMP Join is for (S,G), then besides setting multicast group address along with the version flag v3, the source IP address and the IE flag MUST be set. It should be noted that when advertising the EVPN route for (S,G), the only valid version flag is v3 (v2 flags MUST be set to zero). Another hosts/VMs usage. Also missing a space after "include all sources". 7. Upon receiving EVPN SMET route(s) and before generating the corresponding IGMP Membership Request(s), the PE checks to see whether it has any CE multicast router for that BD on any of its ES's . The PE provides such a check by listening for PIM Hello messages on that AC (i.e, ES,BD). If the PE does have the router's ACs, then the generated IGMP Membership Request(s) are sent to those ACs. If it doesn't have any of the router's AC, then no IGMP Membership Request(s) needs to be generated. This is because sending IGMP Membership Requests to other hosts can result in unintentionally preventing a host from joining a specific multicast group using IGMPv2 - i.e., if the PE does not receive a join from the host it will not forward multicast data to it. Per [RFC4541] , when an IGMPv2 host receives a Membership Report for a group address that it intends to join, the host will suppress its own membership report for the same group, and if the PE does not receive an IGMP Join from host it will not forward multicast data to it. In other words, an IGMPv2 Join MUST NOT be sent on an AC that does not lead to a CE multicast router. This message suppression is a requirement for IGMPv2 hosts. This is not a problem for hosts running IGMPv3 because there is no suppression of IGMP Membership Reports. Need a "the" before "host in "and if the PE does not receive an IGMP Join from host it will not forward multicast data to it." 2. When a PE receives an EVPN SMET route for a given (*,G), it compares the received version flags from the route with its per- PE stored version flags. If the PE finds that a version flag associated with the (*,G) for the remote PE is reset, then the PE MUST generate IGMP Leave for that (*,G) toward its local interface (if any) attached to the multicast router for that multicast group. It should be noted that the received EVPN route SHOULD at least have one version flag set. If all version flags are reset, it is an error because the PE should have received an EVPN route withdraw for the last version flag. Error MUST be considered as BGP error and the PE MUST apply the "treat-as- withdraw" procedure of [RFC7606]. Consider rewording the latter part of this paragraph, note that "Error MUST be considered" is not quite grammatical.. Also if this is an error condition, should the "SHOULD at least have one version flag set" be a MUST? 5. Operation ... o only hosts/VMs o mix of hosts/VMs and multicast source o mix of hosts/VMs, multicast source, and multicast router More hosts/VMs. I will stop mentioning this nit. 6. All-Active Multi-Homing Because the LAG flow hashing algorithm used by the CE is unknown at the PE, in an All-Active redundancy mode it must be assumed that the CE can send a given IGMP message to any one of the multi-homed PEs, either DF or non-DF; i.e., different IGMP Membership Request messages can arrive at different PEs in the redundancy group and furthermore their corresponding Leave messages can arrive at PEs that are different from the ones that received the Join messages. Therefore, all PEs attached to a given ES must coordinate IGMP Membership Request and Leave Group (x,G) state, where x may be either '*' or a particular source S, for each BD on that ES. This allows the DF for that (ES,BD) to correctly advertise or withdraw a Selective Multicast Ethernet Tag (SMET) route for that (x,G) group in that BD when needed. All-Active multihoming PEs for a given ES MUST support IGMP synchronization procedures described in this section if they need to perform IGMP proxy for hosts connected to that ES. "LAG" is undefined. Should "All-Active" really be capitalized? 6.2.2. Common Leave Group Synchronization ... When the Maximum Response Timer expires a PE that has advertised an IGMP Leave Synch route, withdraws it. Any PE attached to the multihomed ES, that started the Maximum Response Time and has no local IGMP Membership Request (x,G) state and no installed IGMP Join Synch routes, it removes IGMP Membership Request (x,G) state for that (ES,BD). If the DF no longer has IGMP Membership Request (x,G) state for that BD on any ES for which it is DF, it withdraws its SMET route for that (x,G) group in that BD. The first sentence should be reworded, ending the sentence with ", withdraws it." reads awkwardly. The next sentence is also long and is confusing to read, I'm actually not quite sure what it is trying to convey. 6.3. Mass Withdraw of Multicast join Sync route in case of failure A PE which has received an IGMP Membership Request, would have synced the IGMP Join by the procedure defined in section 6.1. If a PE with local join state goes down or the PE to CE link goes down, it would lead to a mass withdraw of multicast routes. Remote PEs (PEs where these routes were remote IGMP Joins) SHOULD NOT remove the state immediately; instead General Query SHOULD be generated to refresh the states. There are several ways to detect failure at a peer, e.g. using IGP next hop tracking or ES route withdraw. The first sentence has an extraneous comma after "IGMP Membership Request". 9.1. Selective Multicast Ethernet Tag Route ... o The least significant bit, bit 7 indicates support for IGMP version 1. Since IGMP V1 is being deprecated , sender MUST set it as 0 for IGMP and receiver MUST ignore it. Extraneous comma and space in the second sentence, and missing article before "sender". o The second least significant bit, bit 6 indicates support for IGMP version 2. o The third least significant bit, bit 5 indicates support for IGMP version 3. o The fourth least significant bit, bit 4 indicates whether the (S,G) information carried within the route-type is of an Include Group type (bit value 0) or an Exclude Group type (bit value 1). The Exclude Group type bit MUST be ignored if bit 5 is not set. Is it typical to express this in terms of which least significant bit and the bit number? It reads a bit oddly. It's also only done in some of the descriptions so it is not consistent. o If route is used for IPv6 (MLD) then bit 7 indicates support for MLD version 1. The second least significant bit, bit 6 indicates support for MLD version 2. Since there is no MLD version 3, in case of IPv6 route third least significant bit MUST be 0. In case of IPv6 routes, the fourth least significant bit MUST be ignored if bit 6 is not set. Will there never be a MLD version 3? Also again missing an article before "third least significant bit", though I have similar commentary as above about how to refer to bits individually. 9.1.1. Constructing the Selective Multicast Ethernet Tag route ... Reserved bits MUST be set to 0. They can be defined in future by Why are these a MUST whereas the earlier reserved bits in section 9.1 were SHOULDs? 9.1.2. Default Selective Multicast Route ... Consider the EVPN network of Figure-2, where there is an EVPN instance configured across the PEs. Lets consider PE2 is connected to multicast router R1 and there is a network running PIM ASM behind R1. If there are receivers behind the PIM ASM network, the PIM Join would be forwarded to the PIM RP (Rendezvous Point). If receivers behind PIM ASM network are interested in a multicast flow originated by multicast source S2 (behind PE1), it is necessary for PE2 to receive multicast traffic. In this case PE2 MUST originate a (*,*) SMET route to receive all of the multicast traffic in the EVPN domain. To generate Wildcards (*,*) routes, prcedure from [RFC6625] SHOULD be used. "Lets" should be "Let's", also probably should be "consider that PE2 is connected". The comma in " If there are receivers behind the PIM ASM network, " is extraneous. The last sentence has a typo in "procedure" and is missing "the" before it. 9.2. Multicast Join Synch Route ... Similar commentary for this section about how bits are referred to, both by index and which "least significant bit" they are. o Reserved bits MUST be set to 0. They can be defined in future by other document. Probably don't need the second sentence at all as that's implicit, also "future document" is more grammatical. 9.2.1. Constructing the Multicast Join Synch Route ... The Flags field indicates the version of IGMP protocol from which the Membership Report was received. It also indicates whether the multicast group had INCLUDE or EXCLUDE bit set. Earlier in the section "INCLUDE" and "EXCLUDE" were not capitalized. One should be picked. Reserved bits MUST be set to 0. They can be defined in future by other document. Same commentary as before. -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call