Thanks for comment. Most of comments I have taken care. Please find inline questions.
Reviewer: Matt Joras
Review result: Ready with Nits
I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please treat these comments just
like any other last call comments.
For more information, please see the FAQ at
<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.
Document: draft-ietf-bess-evpn-igmp-mld-proxy-??
Reviewer: Matt Joras
Review Date: 2021-08-26
IETF LC End Date: 2021-09-07
IESG Telechat date: Not scheduled for a telechat
Review
Ethernet Virtual Private Network (EVPN) solution is becoming
pervasive in data center (DC) applications for Network Virtualization
Overlay (NVO) and DC interconnect (DCI) services, and in service
provider (SP) applications for next generation virtual private LAN
services.
This intro to the abstract could use some rewording. For example, "is becoming"
does not read well in a standards document. It is sufficient to describe what
this document is specifying. Also "next generation" is overused and doesn't
usually read well retrospectively.
This draft describes how to support efficiently endpoints running
IGMP for the above services over an EVPN network by incorporating
IGMP proxy procedures on EVPN PEs.
Avoid using the term "draft" as this will have to be edited out since the idea
is for this to be standards track.
Modified
1. Introduction
Ethernet Virtual Private Network (EVPN) solution [RFC7432] is
becoming pervasive in data center (DC) applications for Network
Virtualization Overlay (NVO) and DC interconnect (DCI) services, and
in service provider (SP) applications for next generation virtual
private LAN services.
This is a copy of the abstract, and has similar issues. The introduction serves
a different purpose beyond the abstract, it also has the same grammatical
issues as the abstract.
Modified
In DC applications, a point of delivery (POD) can consist of a
collection of servers supported by several top of rack (TOR) and
Spine switches. This collection of servers and switches are self
contained and may have their own control protocol for intra-POD
communication and orchestration. However, EVPN is used as standard
way of inter-POD communication for both intra-DC and inter-DC. A
subnet can span across multiple PODs and DCs. EVPN provides robust
multi-tenant solution with extensive multi-homing capabilities to
stretch a subnet (VLAN) across multiple PODs and DCs. There can be
many hosts/VMs(virtual machine) (several hundreds) attached to a
subnet that is stretched across several PODs and DCs.
Why is "Spine" capitalized? I'm also wondering whether another term might be
appropriate here that doesn't imply a certain topology. Also, "provides robust
multi-tenant solution" should probably be "provides a robust multi-tenant
solution".
Modified
These hosts/VMs express their interests in multicast groups on a
given subnet/VLAN by sending IGMP Membership Reports (Joins) for
their interested multicast group(s). Furthermore, an IGMP router
periodically sends membership queries to find out if there are hosts
on that subnet that are still interested in receiving multicast
traffic for that group. The IGMP/MLD Proxy solution described in
this draft accomplishes three objectives:
I don't think you need "/VMs". They are a kind of host. There is also another
use of "draft" in this paragraph.
Using VM along with host, does it not server purpose where it articulating that host and VM are denoting similar device ? something similar to subnet/VLAN ?,
I am ok to remove the VM.
3. Selective Multicast: to forward multicast traffic over EVPN
network such that it only gets forwarded to the PEs that have
interest in the multicast group(s), multicast traffic will not be
forwarded to the PEs that have no receivers attached to them for
that multicast group. This draft shows how this objective may be
achieved when Ingress Replication is used to distribute the
multicast traffic among the PEs. Procedures for supporting
selective multicast using P2MP tunnels can be found in
[I-D.ietf-bess-evpn-bum-procedure-updates]
The first sentence is very long and could probably be reworded to be less
redundant. Also there is another instance of "draft".
Modified
The first two objectives are achieved by using IGMP/MLD proxy on the
PE and the third objective is achieved by setting up a multicast
tunnel (e.g., ingress replication) only among the PEs that have
interest in that multicast group(s) based on the trigger from IGMP/
MLD proxy processes. The proposed solutions for each of these
objectives are discussed in the following sections.
The first sentence can probably be split into two. Also, is "(e.g., ingress
replication)" really an example? "multicast tunnel" probably suffices.
Modified
o Ethernet Segment (ES): When a customer site (device or network) is
connected to one or more PEs via a set of Ethernet links, then
that set of links is referred to as an 'Ethernet Segment'.
o Ethernet Segment Identifier (ESI): A unique non-zero identifier
that identifies an Ethernet Segment is called an 'Ethernet Segment
Identifier'.
Both of these terminology definitions can drop the end part where they quote
the thing they're defining. It is implied by the colon.
Modified
o Ethernet Tag: An Ethernet tag identifies a particular broadcast
domain, e.g., a VLAN. An EVPN instance consists of one or more
broadcast domains.
Same issue here more or less. You don't need to start out a sentence saying
"Ethernet tag" again.
Modified
4.1. Proxy Reporting
When IGMP protocol is used between hosts/VMs and their first hop EVPN
router (EVPN PE), Proxy-reporting is used by the EVPN PE to summarize
(when possible) reports received from downstream hosts and propagate
them in BGP to other PEs that are interested in the information.
This is done by terminating the IGMP Reports in the first hop PE, and
translating and exchanging the relevant information among EVPN BGP
speakers. The information is again translated back to IGMP message
at the recipient EVPN speaker. Thus it helps create an IGMP overlay
subnet using BGP. In order to facilitate such an overlay, this
document also defines a new EVPN route type NLRI, the EVPN Selective
Multicast Ethernet Tag route, along with its procedures to help
exchange and register IGMP multicast groups Section 9.
Another usage of hosts/VMs.
Modified
1. When the first hop PE receives several IGMP Membership Reports
(Joins), belonging to the same IGMP version, from different
attached hosts/VMs for the same (*,G) or (S,G), it SHOULD send a
single BGP message corresponding to the very first IGMP
Membership Request (BGP update as soon as possible) for that
(*,G) or (S,G). This is because BGP is a stateful protocol and
no further transmission of the same report is needed. If the
IGMP Membership Request is for (*,G), then multicast group
address MUST be sent along with the corresponding version flag
(v2 or v3) set. In case of IGMPv3, the exclude flag MUST also be
set to indicate that no source IP address must be excluded
(include all sources"*"). If the IGMP Join is for (S,G), then
besides setting multicast group address along with the version
flag v3, the source IP address and the IE flag MUST be set. It
should be noted that when advertising the EVPN route for (S,G),
the only valid version flag is v3 (v2 flags MUST be set to zero).
Another hosts/VMs usage. Also missing a space after "include all sources".
Modified
7. Upon receiving EVPN SMET route(s) and before generating the
corresponding IGMP Membership Request(s), the PE checks to see
whether it has any CE multicast router for that BD on any of its
ES's . The PE provides such a check by listening for PIM Hello
messages on that AC (i.e, ES,BD). If the PE does have the
router's ACs, then the generated IGMP Membership Request(s) are
sent to those ACs. If it doesn't have any of the router's AC,
then no IGMP Membership Request(s) needs to be generated. This
is because sending IGMP Membership Requests to other hosts can
result in unintentionally preventing a host from joining a
specific multicast group using IGMPv2 - i.e., if the PE does not
receive a join from the host it will not forward multicast data
to it. Per [RFC4541] , when an IGMPv2 host receives a Membership
Report for a group address that it intends to join, the host will
suppress its own membership report for the same group, and if the
PE does not receive an IGMP Join from host it will not forward
multicast data to it. In other words, an IGMPv2 Join MUST NOT be
sent on an AC that does not lead to a CE multicast router. This
message suppression is a requirement for IGMPv2 hosts. This is
not a problem for hosts running IGMPv3 because there is no
suppression of IGMP Membership Reports.
Need a "the" before "host in "and if the PE does not receive an IGMP Join from
host it will not forward multicast data to it."
Modified
2. When a PE receives an EVPN SMET route for a given (*,G), it
compares the received version flags from the route with its per-
PE stored version flags. If the PE finds that a version flag
associated with the (*,G) for the remote PE is reset, then the PE
MUST generate IGMP Leave for that (*,G) toward its local
interface (if any) attached to the multicast router for that
multicast group. It should be noted that the received EVPN route
SHOULD at least have one version flag set. If all version flags
are reset, it is an error because the PE should have received an
EVPN route withdraw for the last version flag. Error MUST be
considered as BGP error and the PE MUST apply the "treat-as-
withdraw" procedure of [RFC7606].
Consider rewording the latter part of this paragraph, note that "Error MUST be
considered" is not quite grammatical.. Also if this is an error condition,
should the "SHOULD at least have one version flag set" be a MUST?
Modified
5. Operation
...
o only hosts/VMs
o mix of hosts/VMs and multicast source
o mix of hosts/VMs, multicast source, and multicast router
More hosts/VMs. I will stop mentioning this nit.
Removed all mention of VM.
6. All-Active Multi-Homing
Because the LAG flow hashing algorithm used by the CE is unknown at
the PE, in an All-Active redundancy mode it must be assumed that the
CE can send a given IGMP message to any one of the multi-homed PEs,
either DF or non-DF; i.e., different IGMP Membership Request messages
can arrive at different PEs in the redundancy group and furthermore
their corresponding Leave messages can arrive at PEs that are
different from the ones that received the Join messages. Therefore,
all PEs attached to a given ES must coordinate IGMP Membership
Request and Leave Group (x,G) state, where x may be either '*' or a
particular source S, for each BD on that ES. This allows the DF for
that (ES,BD) to correctly advertise or withdraw a Selective Multicast
Ethernet Tag (SMET) route for that (x,G) group in that BD when
needed. All-Active multihoming PEs for a given ES MUST support IGMP
synchronization procedures described in this section if they need to
perform IGMP proxy for hosts connected to that ES.
"LAG" is undefined. Should "All-Active" really be capitalized?
“LAG defined”. I see RFC7432 has “All-Active” capitalized all the place. Should it not be aligned with RFC7432?
6.2.2. Common Leave Group Synchronization
...
When the Maximum Response Timer expires a PE that has advertised an
IGMP Leave Synch route, withdraws it. Any PE attached to the
multihomed ES, that started the Maximum Response Time and has no
local IGMP Membership Request (x,G) state and no installed IGMP Join
Synch routes, it removes IGMP Membership Request (x,G) state for that
(ES,BD). If the DF no longer has IGMP Membership Request (x,G) state
for that BD on any ES for which it is DF, it withdraws its SMET route
for that (x,G) group in that BD.
The first sentence should be reworded, ending the sentence with ", withdraws
it." reads awkwardly. The next sentence is also long and is confusing to read,
I'm actually not quite sure what it is trying to convey.
Does this makes sense ?
“When the Maximum Response Timers expires a PE that has advertised an IGMP Leave Sync route, withdraws the Leave Sync route”.
Next sentence though looks long, but it does cover all the scenario before cleaning up the local state. It is trying to cover the case about when to clean up
the local join state on (ES,BD) after Max response timer expired and it would be done
if
-
No more local join state on any bridge port for BD
-
No more remote join state on any ports for BD
Only other way could be breaking them into these small points ? please let me know your view.
6.3. Mass Withdraw of Multicast join Sync route in case of failure
A PE which has received an IGMP Membership Request, would have synced
the IGMP Join by the procedure defined in section 6.1. If a PE with
local join state goes down or the PE to CE link goes down, it would
lead to a mass withdraw of multicast routes. Remote PEs (PEs where
these routes were remote IGMP Joins) SHOULD NOT remove the state
immediately; instead General Query SHOULD be generated to refresh the
states. There are several ways to detect failure at a peer, e.g.
using IGP next hop tracking or ES route withdraw.
The first sentence has an extraneous comma after "IGMP Membership Request".
Modified
9.1. Selective Multicast Ethernet Tag Route
...
o The least significant bit, bit 7 indicates support for IGMP
version 1. Since IGMP V1 is being deprecated , sender MUST set it
as 0 for IGMP and receiver MUST ignore it.
Extraneous comma and space in the second sentence, and missing article before
"sender".
Modified
o The second least significant bit, bit 6 indicates support for IGMP
version 2.
o The third least significant bit, bit 5 indicates support for IGMP
version 3.
o The fourth least significant bit, bit 4 indicates whether the
(S,G) information carried within the route-type is of an Include
Group type (bit value 0) or an Exclude Group type (bit value 1).
The Exclude Group type bit MUST be ignored if bit 5 is not set.
Is it typical to express this in terms of which least significant bit and the
bit number? It reads a bit oddly. It's also only done in some of the
descriptions so it is not consistent.
This comment is not clear to me, does it mean document should not mention per bit description ?
o If route is used for IPv6 (MLD) then bit 7 indicates support for
MLD version 1. The second least significant bit, bit 6 indicates
support for MLD version 2. Since there is no MLD version 3, in
case of IPv6 route third least significant bit MUST be 0. In case
of IPv6 routes, the fourth least significant bit MUST be ignored
if bit 6 is not set.
Will there never be a MLD version 3? Also again missing an article before
"third least significant bit", though I have similar commentary as above about
how to refer to bits individually.
As of today MLD version 2 indicates what IGMPv3 does. In future if MLD version 3 or more gets defined, new bits and procedure would be required.
9.1.1. Constructing the Selective Multicast Ethernet Tag route
...
Reserved bits MUST be set to 0. They can be defined in future by
Why are these a MUST whereas the earlier reserved bits in section 9.1 were
SHOULDs?
Is it ok to make all reserve bits MUST be zero ?
9.1.2. Default Selective Multicast Route
...
Consider the EVPN network of Figure-2, where there is an EVPN
instance configured across the PEs. Lets consider PE2 is connected
to multicast router R1 and there is a network running PIM ASM behind
R1. If there are receivers behind the PIM ASM network, the PIM Join
would be forwarded to the PIM RP (Rendezvous Point). If receivers
behind PIM ASM network are interested in a multicast flow originated
by multicast source S2 (behind PE1), it is necessary for PE2 to
receive multicast traffic. In this case PE2 MUST originate a (*,*)
SMET route to receive all of the multicast traffic in the EVPN
domain. To generate Wildcards (*,*) routes, prcedure from [RFC6625]
SHOULD be used.
"Lets" should be "Let's", also probably should be "consider that PE2 is
connected". The comma in " If there are receivers behind the PIM ASM network, "
is extraneous. The last sentence has a typo in "procedure" and is missing "the"
before it.
Modified
9.2. Multicast Join Synch Route
...
Similar commentary for this section about how bits are referred to, both by
index and which "least significant bit" they are.
o Reserved bits MUST be set to 0. They can be defined in future by
other document.
Probably don't need the second sentence at all as that's implicit, also "future
document" is more grammatical.
Modified
9.2.1. Constructing the Multicast Join Synch Route
...
The Flags field indicates the version of IGMP protocol from which the
Membership Report was received. It also indicates whether the
multicast group had INCLUDE or EXCLUDE bit set.
Earlier in the section "INCLUDE" and "EXCLUDE" were not capitalized. One should
be picked.
Reserved bits MUST be set to 0. They can be defined in future by
other document.
Same commentary as before.
Modified