Reviewer: Gyan Mishra
Review result: Ready with Issues
I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please treat these comments just
like any other last call comments.
For more information, please see the FAQ at
<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.
Document: draft-ietf-bess-evpn-optimized-ir-??
Reviewer: Gyan Mishra
Review Date: 2021-10-02
IETF LC End Date: 2021-09-07
IESG Telechat date: Not scheduled for a telechat
Summary:
I am the GEN-ART reviewer for this draft and am reviewing the draft as a BESS
WG member familiar with the EVPN technology and issues that exist with IR and
understand the need for the IR optimized solution for BUM replication. This
draft clearly defines the problem to be solved with IR BUM replication & the
proposed EVPN Optimized IR Solution which is technically sound. My comments,
considerations & recommendations are related re-writing of some of the
technical verbiage to help improve the draft. The draft is well written &
clearly describes the problem with EVPN IR PTA and how the Optimized IR
solution with AR replication RT-11 can be used to provide an optimized
Selective P-Tree so all PEs do not have to receive the BUM as exists today with
RT-3 I-PMSI. This draft provides a EVPN procedure optimization for IR PTA R-3
X-PMSI that utilizes a new RT-11 Leaf A-D that was introduced in Jeffrey
Zhang’s EVPN BUM Procedure update “draft
draft-ietf-bess-evpn-bum-procedure-updates-10” that utilizes the RFC 6513
Leaf-AD route to create a new Selective tree Leaf A-D Route for optimized EVPN
BUM procedures for inter-as segmentation for any PTA P-Tree being instantiated
including IR.
Leaf Auto-Discovery (A-D) routes [RFC6513]: For explicit leaf
tracking purpose.
Leaf A-D concept from RFC 6514 Leaf A-D route for Multicast in VPLS RFC 7117
Section 8.3 bottom of page 33 & optimized selective & inclusive P-Tree X-PMSI
tunnels with or without inter-as segmentation and “draft
draft-ietf-bess-evpn-bum-procedure-updates-10” P-Tree Multicast both
specifications uses RFC 7524 Section 4 Inter-Area P2MP Segmented Next hop
extended community (S-NH-EC) utilized for tunnel segmentation for seamless
MPLS MVPN Multicast setting of “Leaf information required” L flag in PTA now
used in EVPN BUM procedures updates in draft “draft
draft-ietf-bess-evpn-bum-procedure-updates-10” Section 6.3 and now also used in
EVPN IR Optimizations draft for Assisted Replication function in RT-11
(S-NH-EC) with caveat that S-NH-EC is not used is changed from RFC 7524 which
should be reflected in the verbiage.
RFC 7524 S-NH-EC Section 4
4. Inter-Area P2MP Segmented Next-Hop Extended Community
This document defines a new Transitive IPv4-Address-Specific Extended
Community Sub-Type: "Inter-Area P2MP Next-Hop". This document also
defines a new BGP Transitive IPv6-Address-Specific Extended Community
Sub-Type: "Inter-Area P2MP Next-Hop".
A PE, an ABR, or an ASBR constructs the Inter-Area P2MP Segmented
Next-Hop Extended Community as follows:
- The Global Administrator field MUST be set to an IP address of the
PE, ABR, or ASBR that originates or advertises the route carrying
the P2MP Next-Hop Extended Community. For example this address
may be the loopback address or the PE, ABR, or ASBR that
advertises the route.
- The Local Administrator field MUST be set to 0.
If the Global Administrator field is an IPv4 address, the
IPv4-Address-Specific Extended Community is used; if the Global
Administrator field is an IPv6 address, the IPv6-Address-Specific
Extended Community is used.
The detailed usage of these Extended Communities is described in
the following sections.
Excerpt from RFC 7524 Section 6.3 also verbiage used in the BUM procedure
update Section 6.3 as well as this EVPN IR optimization draft Section 4 page 9:
6.3. Use of S-NH-EC
[RFC7524] specifies the use of S-NH-EC because it does not allow ABRs
to change the BGP next hop when they re-advertise I/S-PMSI A-D routes
to downstream areas. That is only to be consistent with the MVPN
Inter-AS I-PMSI A-D routes, whose next hop must not be changed when
they're re-advertised by the segmenting ABRs for reasons specific to
MVPN. For EVPN, it is perfectly fine to change the next hop when
RBRs re-advertise the I/S-PMSI A-D routes, instead of relying on S-
NH-EC. As a result, this document specifies that RBRs change the BGP
next hop when they re-advertise I/S-PMSI A-D routes and do not use S-
NH-EC. If a downstream PE/RBR needs to originate Leaf A-D routes, it
constructs an IP-based Route Target Extended Community by placing the
IP address carried in the Next Hop of the received I/S-PMSI A-D route
in the Global Administrator field of the Community, with the Local
Administrator field of this Community set to 0 and setting the
Extended Communities attribute of the Leaf A-D route to that
Community.
RFC 7117 Excerpt Section 8.3 bottom:
The PE constructs an IP-address-specific RT by placing the IP
address carried in the Next Hop field of the received S-PMSI A-D
route in the Global Administrator field of the Community, with
the Local Administrator field of this Community set to 0 and
setting the Extended Communities attribute of the leaf A-D route
to that Community.
This draft EVPN IR Optimization Section 4 page 9
The AR-LEAF constructs an IP-address-specific route-target as
indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by
placing the IP address carried in the Next-Hop field of the
received Replicator-AR route in the Global Administrator field
of the Community, with the Local Administrator field of this
Community set to 0. Note that the same IP-address-specific
import route-target is auto-configured by the AR-REPLICATOR
that sent the Replicator-AR, in order to control the acceptance
of the Leaf A-D routes.
RFC 6514 Leaf A-D route is being used for EVPN procedures RT-11 to build the
selective tree optimization using a new Assisted Replication (AR) procedure
which is the EVPN IR optimization in this draft.
The confusing part about this draft is that it mentions NVO3 & MPLS PTA. In
general, NVO3 overlay encapsulations are used in Data Centers with typically IP
based underlay, however MPLS EVPN procedures RFC 7432 applies to both DC or
Core any underlay IP, MPLS, SR underlay. This draft as its written applies to
an NVO3 overlay IR procedure optimization utilized in a Data Centers, however
the Data Center underlay as well as Core can be MPLS or IP based and can both
have an NVO3 overlay, however the Data Center environment is generally where
NVO3 VTEP termination tunnel endpoints reside and the core carries the EVPN
control plane inter-DC. RFC 7432 MPLS / IP EVPN supports both IP & MPLS
underlay with IP underlay supporting IR PTA only and MPLS underlay supporting
all PTAs for RT-3 I-PMSI inclusive tree. Here are a few scenario for the
authors to think about and where the EVPN IR replication optimization solution
could be utilized. The point I would like to make here is that for BUM the use
of Multicast P2MP mLDP or RSVP-TE PTA is always the most preferred method to
handle BUM for both Core or DC scenario and only certain scenario’s that exist
where multicast would not be preferred. As the NVO3 & MPLS can be used in both
DC or Core scenario, I will mention both DC & Core scenario, as both pertains
to this draft. If MPLS is used in the DC or Core then the DC or Core could be
“PIM” free & “BGP” free in the underlay and mLPD or RSVP-TE PTA options could
be utilized as the optimal BUM solution. If MPLS is used in the DC or Core
then the DC or Core could be “BGP” free but PIM is enabled in the core for PIM
Rosen MDT RFC 6037. In the above use cases the IR optimization would not
optimal or preferred solution. Only if IP Is used in the DC or Core in which
case MVPN PTA options are not possible as MVPN is only utilized with MPLS
underlay & “PIM” is not desired in the underlay then this IR optimization could
be utilized. However, in the use case where underlay is IP only and not MPLS
& “PIM” is not desired then this IR optimization would be the most desired
solution for BUM with the caveat in this case that as MVPN procedures RFC 6513
& 6514 is used with MPLS underlay for PTA in this case the only viable PTA
would be IR as all the other PTA have MPLS underlay dependency. So in summary
if MPLS exists then there are a lot of viable X-PMSI PTA options for both DC &
Core for EVPN NVO3 BUM and IR would not be the desired, and only the unique
case for IP underlay when “PIM” is not desired. I believe IR optimization AR
replication solution can be used for MPLS underlay as well as there could be a
use case where even though other PTAs X-PMSI are available it is desired to use
IR as PTA’s that use MPLS based multicast is not desired and in those cases the
IR optimization could be for both DC or Core & could apply to Core “Non NVO3”
use case of EVPN PE-CE AC MLAG All Active Multi-home. This solution breaks up
the BUM 3 tuple “Broadcast, Unknown Unicast, Multicast” into BM
“Broadcast/Multicast” & keeps Unknown Unicast separated out treated the same as
known unicast. As MPLS EVPN has a ubiquitous framework & thus ubiquitous use
cases and can be used for DC or Core and any underlay IP, MPLS or SR where the
two primary use cases for EVPN are NVO3 encapsulation overlay for DC
multi-tenant environments and NG L2 VPN PE-CE L2 AC advancement addressing
VPLS/H-VPLS gaps that existed to NG MPLS L2 VPN “EVPN” E-LINE, E-LAN,E-TREE,
this IR optimization draft as well should apply to any EVPN use case and not
limited to NVO3. BUM & why to separate out BUM 3-tuple (Broadcast, Unknown
Unicast, Multicast) separate out Unknown Unicast BUM handling from Broadcast &
Multicast “BM” traffic.
[jorge] about this:
“I believe IR optimization AR
replication solution can be used for MPLS underlay as well as there could be a
use case where even though other PTAs X-PMSI are available it is desired to use
IR as PTA’s that use MPLS based multicast is not desired and in those cases the
IR optimization could be for both DC or Core & could apply to Core “Non NVO3””
The document is focused on NVO solutions, that is, IP-tunnels, because the AR-REPLICATOR relies on a lookup on the IP tunnel outer IPs. Here NVO assumes the
same as in RFC8365, which is the reference for EVPN as a control plane for IP tunnels. The terminology in this document follows RFC8365, however let us know if something needs to be clarified. When there is an MPLS underlay, an assisted replication solution
is possible based on other procedures that are out of scope, e.g.,
draft-ietf-bess-evpn-virtual-hub-00 . We can state more clearly that MPLS underlay is out of scope if you think it is not clear?
With regards to the BUM Broadcast / Unkown Unicast - With Proxy ARP/Proxy ND
what occurs is when the broadcast occurs as an ARP All Fs broadcast, the first
ARP packet goes out and the Type 2 change from unknown mac / ip to Mac when arp
request is sent and then when reply is received the MAC/IP state is created.
After that point no further ARPs are sent for the device. Most implementations
have a ARP/ND refresh so to keep the MAC/IP state current and purge the old
entries save on MAC VRF URIB state tradeoff so there is constant ARP and is
does not necessarily stop even with Proxy ARP. Trade off is maintain the
larger MAC VRF if the ARP/ND refresh did not occur which is worse that you
don’t want to hit the ceiling on the MAC VRF which is worse. So the draft
states that Broadcast is greatly reduced by Proxy ARP / Proxy ND capability &
Unknown Unicast is greatly reduced by in virtualized NVO3 networks where MAC/IP
is learned in the control plane. Even with Proxy ARP / ND ARP as stated above
the 1st ARP packet is sent as flood all FFs until the control plane MAC
learning generates the Type 2 MAC-IP route, however since most implementations
track the MAC-IP control plane state with refresh timer to age out and purge
old entries the all FF’s ARP broadcast ends up being sent more often then just
once due to the refresh timers to purge the MAC-IP VRF. Unknown unicast is a
situation where the switch does not have the MAC address in its CAM table or in
the EVPN scenario the MAC/IP does not exist in leaf within the fabric. In a L2
switch environment the Unknown unicast “out of sync” of Bridge tables can occur
when first hop routing protocol is salt/peppered even/odd such that only the
Active Router has the MAC and the Standby router does not. With EVPN All
Active Multi-home MHD/MHN MLAG scenario of host endpoint connections both leafs
are active so there is never an out of sync situation where one leaf has the
MAC and the other leaf does not. Also EVPN backup path aliasing uniform load
balancing over MLAG & local bias may take care of the Unknown Unicast making it
nill or very rare in a EVPN NVO3 environment. BUM Broadcast ARP/ND traffic
would definitely exist even with Proxy ARP/ Proxy ND and it can be quite
substantial due to refresh/purge timers.
Is the reason for treating the Unknown Unicast differently broken out from “BM”
because none exists in a NVO3 environment?
[jorge] BM and unknown (U) have separate flags in the Pruned-Flood-List flags, indeed. Also BM and U are handled differently since we want Unknown and known
to follow the same path to avoid reordering. As for the PFL flags, the thought was that BM are both multipoint traffic always and the way to handle flooding for both B and M (without snooping mcast protocols) is similar. Unknown traffic is handled differently,
and flooding can be optimized in a typical use-case where we know some compute based NVEs are not interested in receiving unknown traffic, since they always advertise the local MACs in advance.
With regards to EVPN IR optimization
for BUM traffic as this draft addresses BUM optimization when using IR, as
draft draft-ietf-bess-evpn-igmp-mld-proxy defines a new SMET A-D RT-6 route for
IR optimization for BUM which is equivalent to this drafts leaf-ad route but
unsolicited and untargeted. This draft must mention normatively in the draft,
draft-ietf-bess-evpn-igmp-mld-proxy as an alternative solution for BUM IR
optimization and why this solution should be utilized for BUM IR optimization
over the SMET RT-6 style optimization. Also how is this drafts RT-11 selective
trees AR replications solution interoperate with draft
draft-ietf-bess-evpn-igmp-mld-proxy SMET route. Is that possible or do you
have to implement one or the other.
[jorge] This document is mostly about the Assisted Replication definition as a new Tree type (besides the PFL flags, which are independent). AR can be used
for BUM, as explained in the document, but it can also be used for multicast when igmp/mld proxy is enabled in the EVPN BDs, or even in the case of Optimized Inter-Subnet Multicast (OISM) forwarding – draft-ietf-bess-evpn-irb-mcast. The latter clearly explains
how AR is used in OISM.
Major issues:
None
Minor issues:
Abstract
OLD TXT
Network Virtualization Overlay (NVO) networks using EVPN as control
plane may use Ingress Replication (IR) or PIM (Protocol Independent
Multicast) based trees to convey the overlay Broadcast, Unknown
unicast and Multicast (BUM) traffic. PIM provides an efficient
solution to avoid sending multiple copies of the same packet over the
same physical link, however it may not always be deployed in the NVO
core network. IR avoids the dependency on PIM in the NVO network
core. While IR provides a simple multicast transport, some NVO
networks with demanding multicast applications require a more
efficient solution without PIM in the core. This document describes
a solution to optimize the efficiency of IR in NVO networks.
NEW TXT
Network Virtualization Overlay (NVO) networks and BGP MPLS Based L2 VPN
E-LINE, E-LAN, E-TREE flavor Ethernet VPN’s in a Service Provider Core and Data
Center Networks using EVPN as control plane may use any available PMSI Tunnel
Attribute (PTA)such as Ingress Replication (IR) RFC 7988,PIM (Protocol
Independent Multicast)MDT SAFI RFC 6037, mLDP P2MP MP2MP RFC 6388 or RSVP-TE
P2MP RFC 4875 based P-Trees to replicate the overlay Broadcast, Unknown unicast
and Multicast (BUM) traffic. Multicast based PTA tunnel types provides an
efficient solution to avoid sending multiple copies of the same packet over the
same physical link, however in a Data Center all the PTA tunnel types may not
be available with IP-Based underlay and native PIM is not desirable or with
MPLS-Based underlay with “BGP” and “PIM” free core where the operator is
migrating to Segment Routing and is in the process of eliminating LDP and
RSVP-TE P2MP PTA is not desirable. In these use cases, the only option
available is to use IR. While IR provides a simple multicast transport, in the
case of Service Provider Core migrating to Segment Routing or Data Center NVO
networks with IP-Based underlay with demanding multicast applications require a
more efficient solution than IR. This document describes a solution to
optimize the efficiency of IR in a Service Provider Core in transition to
Segment Routing or Data Center NVO network with IP-Based underlay.
[jorge] As explained earlier, the document is really focused on EVPN with NVO tunnels, as per RFC8365. So no Segment Routing MPLS or no RSVP/LDP. And for an
NVO solution using EVPN (RFC8365), there is only PIM trees (different flavors of PIM) or IR defined. This document extends the list to AR as well. But I don’t think we should add the MPLS underlay related text in the abstract. Based on this, please let us
know if you still think we need to clarify the abstract.
Introduction
OLD TXT
Ethernet Virtual Private Networks (EVPN) may be used as the control
plane for a Network Virtualization Overlay (NVO) network. Network
Virtualization Edge (NVE) devices and Provider Edges (PEs) that are
part of the same EVPN Instance (EVI) use Ingress Replication (IR) or
PIM-based trees to transport the tenant's Broadcast, Unknown unicast
and Multicast (BUM) traffic. In NVO networks where PIM-based trees
cannot be used, IR is the only option. Examples of these situations
are NVO networks where the core nodes don't support PIM or the
network operator does not want to run PIM in the core.
In some use-cases, the amount of replication for BUM (Broadcat, Unkown
Unicast, Multicast) traffic is kept under control on the NVEs due to the
following fairly common assumptions:
a. Broadcast is greatly reduced due to the proxy ARP (Address
Resolution Protocol) and proxy ND (Neighbor Discovery)
capabilities supported by EVPN on the NVEs. Some NVEs can even
provide Dynamic Host Configuration Protocol (DHCP) server
functions for the attached Tenant Systems (TS) reducing the
broadcast even further.
b. Unknown unicast traffic is greatly reduced in virtualized NVO
networks where all the MAC and IP addresses are learned in the
control plane.
c. Multicast applications are not used.
If the above assumptions are true for a given NVO network, then IR
provides a simple solution for multi-destination traffic. However,
the statement c) above is not always true and multicast applications
are required in many use-cases.
When the multicast sources are attached to NVEs residing in
hypervisors or low-performance-replication TORs (Top Of Rack
switches), the ingress replication of a large amount of multicast
traffic to a significant number of remote NVEs/PEs can seriously
degrade the performance of the NVE and impact the application.
NEW TXT
Service Provider Core and Data Center networks may use Ethernet Virtual Private
Networks (EVPN)as the control plane for an Network Virtualization Overlay (NVO)
network with IP-Based Underlay or BGP MPLS Based L2 VPN E-LINE, E-LAN, E-TREE
flavor Ethernet VPN’s Virtualization Edge (NVE) devices and Provider Edges
(PEs) that are part of the same EVPN Instance (EVI)can use Ingress Replication
(IR) or any available MPLS based PTA for P-Tree instantiation to transport the
tenant's Broadcast, Unknown unicast and Multicast (BUM) traffic. In Service
Provider Core or Data Center NVO networks where MPLS based PTA’s are not
available such as a Service Provider core migrating to Segment Routing where
LDP is being eliminated and RSVP-TE P2MP is not desirable or Data Center
network with IP-Based Underlay and Native PIM is not desirable, IR is the only
option. Examples of these situations are NVO networks where the core nodes
don't support MPLS based PTA with dependency on mLDP and both Native PIM and
RSVP-TE P2MP LSM is not desirable.
In some use-cases, the amount of replication for BUM traffic is kept
under control on the NVEs due to the following fairly common
assumptions:
a. Broadcast is moderately reduced due to the proxy ARP (Address
Resolution Protocol) and proxy ND (Neighbor Discovery)
capabilities supported by EVPN on the NVEs with Selective IR
tunnels optimization defined in draft
draft-ietf-bess-evpn-igmp-mld-proxy. Some NVEs can even
provide Dynamic Host Configuration Protocol (DHCP) server
functions for the attached Tenant Systems (TS) reducing the
broadcast even further. During the Proxy ARP/ND process the first ARP
packet is still send all F’s broadcast resulting in Type 2 change from
Unknown Mac-IP route to MAC-IP route when ARP/ND request is sent and
reply is received the MAC VRF MAC-IP state is created. Proxy ARP/ND
then suppresses or proxies all ARP/ND sent by the local hosts. However,
due to ARP/ND refresh state requirements to keep the MAC-IP state
current and purge the old entries save on MAC VRF URIB state as a
tradeoff there maybe additional ARP/ND packets sent for each MAC VRF
MAC-IP entry. The IGMP-MLD proxy Selective IR tunnel optimization draft
improves the performance of IR using SMET route and maybe used in
conjunction with this draft. Even though Proxy ARP/ND suppression
techniques are utilized as the refresh/purge must be implemented to age
old entries to control the MAC VRF size the broadcast traffic is only
moderately reduced and thus RFC 7432 EVPN IR for BUM is not a viable
solution without the IR optimization solution defined in this draft
and/or draft-ietf-bess-evpn-igmp-mld-proxy.
***Please investigate if both EVPN IR optimizations can be used together and
what are all the caveats and if they cannot be used together and why** The
main point here that should be mentioned is that Broadcast traffic is reduced
but there is still a considerable amount of broadcast traffic that needs to be
optimized
b. Unknown unicast traffic is eliminated in virtualized NVO
networks due to all the MAC and IP addresses are learned in the
control plane for All-Active Multi-home LAG scenario and reduced for
Single-Active Multi-Home EVPN scenario. Unknown unicast is a situation
where the packet has the IP and MAC, however the switch is missing the
MAC entry which occurs due to Layer 2 switch BD table synchronization
becomes unsynchronized due to salt and pepper of first hop router
redundancy active router VLAN between L2 switches resulting in Unknown
unicast. In an EVPN scenario with All-Active-Multi-Home the MAC-IP
remains synchronized with ESI auto discovery, however with
Single-Active-Multi-Home the MAC-IP may not be synchronized resulting in
Unknown unicast. As a result, there is minimal to none Unknown Unicast
in a NVO network.
c. Multicast applications are not used.
If the above assumptions are true for a given NVO network, then IR
provides a simple solution for multi-destination traffic. However,
the statement c) above is not always true and multicast applications
are required in many use-cases.
When the multicast sources are attached to NVEs residing in
hypervisors or low-performance-replication TORs (Top Of Rack
switches), the ingress replication of a large amount of multicast
traffic to a significant number of remote NVEs/PEs can seriously
degrade the performance of the NVE and impact the application.
In the draft it should be mentioned the reason why BM (Broacast & Multicast)
are treated differently by this solution then Unknown Unicast. My answer is
that the Unknown Unicast is minimal to none so does not need the optimization.
[jorge] after explaining that MPLS underlay is out of scope, please let us know if it is ok to keep the OLD text. About AR tunnels being used in igmp/mld proxy
and OISM, yes, compatibility is perfectly okay, but since this document just defines the AR tree and predates draft-ietf-bess-evpn-igmp-mld-proxy and draft-ietf-bess-evpn-irb-mcast I don’t think this document needs to refer to the other two drafts, but rather
the opposite.
Terminology section:
OLD TXT
- Regular-IR: Refers to Regular Ingress Replication, where the
source NVE/PE sends a copy to each remote NVE/PE part of the BD.
- IR-IP: IP address used for Ingress Replication as in [RFC7432].
- AR-IP: IP address owned by the AR-REPLICATOR and used to
differentiate the ingress traffic that must follow the AR
procedures.
New TXT
- Regular-IR: an EVPN RT-3 ( Route Type 3) Regular Ingress Replication, where
the source NVE/PE sends a copy to each remote NVE/PE part of the BD.
- IR-IP: PTA Tunnel endpoint identifier which carries the unicast tunnel
endpoint (Loopback) IP address of the Non-AR-Replicator local PE used for
Ingress Replication as defined in RFC 6514.
[jorge] The document uses the IR-IP as just an IP address, that can be used in the PTA tunnel identifier but also in the route’s next-hop and originating IP,
also the outer IP DA of the packets is compared against this IP. So I don’t think the above is correct. I prefer to keep the current definition and then have the text explaining how to use the IR-IP.
- AR-IP: PTA Tunnel endpoint identifier which carries the unicast tunnel
endpoint (loopback) IP address of the AR-REPLICATOR local PE as defined in RFC
6514 and used to differentiate the ingress traffic that must follow the AR
procedures.
[jorge] similar to the IR-IP comment.. AR-IP is an IP address of the replicator, the text later explains how to use it (not only in the PTA).
Updated the reference to what the AR-IP & IR-IP is basically is the PMSI Tunnel
attribute PTA termination endpoint ID, AR-IP for the AR node & IR-IP for Non-AR
node.
[jorge] AR-IP and IR-IP are not limited to the PTA, please see above.
RFC 7432 section 11.2 references RFC 6514 PMSI tunnel attribute must contain
the identity of the tree RFC 7432 Section 11.2 11.2. P-Tunnel Identification
In order to identify the P-tunnel used for sending broadcast, unknown
unicast, or multicast traffic, the Inclusive Multicast Ethernet Tag
route MUST carry a Provider Multicast Service Interface (PMSI) Tunnel
attribute as specified in [RFC6514].
+ If the PE that originates the advertisement uses ingress
replication for the P-tunnel for EVPN, the route MUST include the
PMSI Tunnel attribute with the Tunnel Type set to Ingress
Replication and the Tunnel Identifier set to a routable address of
the PE.
RFC 6514 Section 5
When the Tunnel Type is set to Ingress Replication, the Tunnel
Identifier carries the unicast tunnel endpoint IP address of the
local PE that is to be this PE's receiving endpoint address for the
tunnel.
Section 3 Solution Requirements
OLD TXT
a. It provides an IR optimization for BM (Broadcast and Multicast)
traffic without the need for PIM, while preserving the packet
order for unicast applications, i.e., known and unknown unicast
traffic should follow the same path. This optimization is
required in low-performance NVEs.
NEW TXT
a. It provides an IR optimization for BM (Broadcast and Multicast)
traffic without the need for PTA’s with MPLS or PIM based dependencies,
while preserving the packet order for unicast applications, i.e., known
and unknown unicast traffic should follow the same path. This
optimization is required in low-performance NVEs.
[jorge] similar comment as earlier about MPLS trees, they are out of scope of the document.
How is IR optimization preserving unicast ordering ?
Normal Unicast traffic is not BUM and thus would not use EVPN IR optimization
AR mechanism.
[jorge] the solution uses AR for BM and unknown/known unicast follow regular ingress replication, that’s how they can follow the same path. This is the relevant
text, let us know if it requires clarification:
This solution recommends the replication of BM through the
AR-REPLICATOR node, whereas unknown/known unicast will be delivered
directly from the source node to the destination node without being
replicated by any intermediate node. Unknown unicast SHALL follow
the same path as known unicast traffic in order to avoid packet
reordering for unicast applications and simplify the control and data
plane procedures.
Section 4 – Type3 is being extended to support -optimized IR – new type 3 – so
that is part of capability exchange
4. EVPN BGP Attributes for optimized-IR
This solution extends the [RFC7432] Inclusive Multicast Ethernet Tag
routes and attributes so that an NVE/PE can signal its optimized-IR
capabilities.
7432 section 7.3
7.3. Inclusive Multicast Ethernet Tag Route
An Inclusive Multicast Ethernet Tag route type specific EVPN NLRI
consists of the following:
+---------------------------------------+
| RD (8 octets) |
+---------------------------------------+
| Ethernet Tag ID (4 octets) |
+---------------------------------------+
| IP Address Length (1 octet) |
+---------------------------------------+
| Originating Router's IP Address |
| (4 or 16 octets) |
+---------------------------------------+
Please reference below with RFC 6514 Section 5
5. PMSI Tunnel Attribute
This document defines and uses a new BGP attribute called the
"P-Multicast Service Interface Tunnel (PMSI Tunnel) attribute". This
is an optional transitive BGP attribute. The format of this
attribute is defined as follows:
RFC 6514 BGP Encodings and Procedures for MVPNs February 2012
+---------------------------------+
| Flags (1 octet) |
+---------------------------------+
| Tunnel Type (1 octets) |
+---------------------------------+
| MPLS Label (3 octets) |
+---------------------------------+
| Tunnel Identifier (variable) |
+---------------------------------+
[jorge] are you suggesting this document should refer to RFC6514? This document is based on RFC7432 and RFC8365, both of which make use of the PTA in RFC6514,
but that should have been already referenced in those… ?
Section 4 top of page 8
As described in the summary section of the review, this section should
reference RFC 7524 Section 4 which is referenced by
“draft-ietf-bess-evpn-bum-procedure-updates” section 6.3 S-NH-EC and also
reference used by RFC 7117 Section 8.3 and in describe that in
“draft-ietf-bess-evpn-bum-procedure-updates” that for EVPN S-NH-EC in the
Leaf-AD routes is not necessary for the response to Replicator-AR route RT-3.
This should be included in the verbiage.
[jorge] hmm… is that necessary given that we already refer to I-D.ietf-bess-evpn-bum-procedure-updates?
I updated some normative language – please check
OLD TXT
In this document, the above RT-3 and PTA can be used in two different
modes for the same BD:
- Regular-IR route: in this route, Originating Router's IP Address,
Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used
as described in [RFC7432] when Ingress Replication is in use. The
NVE/PE that advertises the route will set the Next-Hop to an IP
address that we denominate IR-IP in this document. When
advertised by an AR-LEAF node, the Regular-IR route SHOULD be
advertised with type T= AR-LEAF.
- Replicator-AR route: this route is used by the AR-REPLICATOR to
advertise its AR capabilities, with the fields set as follows:
o Originating Router's IP Address MUST be set to an IP address of
the PE that should be common to all the EVIs on the PE (usually
this is the PE's loopback address). The Tunnel Identifier and
Next-Hop SHOULD be set to the same IP address as the
Originating Router's IP address when the NVE/PE originates the
route. The Next-Hop address is referred to as the AR-IP and
SHOULD be different than the IR-IP for a given PE/NVE.
o Tunnel Type = Assisted-Replication Tunnel. Section 11 provides
the allocated type value.
o T (AR role type) = 01 (AR-REPLICATOR).
o L (Leaf Information Required) = 0 (for non-selective AR) or 1
(for selective AR).
In addition, this document also uses the Leaf A-D route (RT-11)
defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the
selective AR mode is used. The Leaf A-D route MAY be used by the AR-
LEAF in response to a Replicator-AR route (with the L flag set) to
advertise its desire to receive the BM traffic from a specific AR-
REPLICATOR. It is only used for selective AR and its fields are set
as follows:
o Originating Router's IP Address is set to the advertising PE's
IP address (same IP used by the AR-LEAF in regular-IR routes).
The Next-Hop address is set to the IR-IP.
o Route Key is the "Route Type Specific" NLRI of the Replicator-
AR route for which this Leaf A-D route is generated.
o The AR-LEAF constructs an IP-address-specific route-target as
indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by
placing the IP address carried in the Next-Hop field of the
received Replicator-AR route in the Global Administrator field
of the Community, with the Local Administrator field of this
Community set to 0. Note that the same IP-address-specific
import route-target is auto-configured by the AR-REPLICATOR
that sent the Replicator-AR, in order to control the acceptance
of the Leaf A-D routes.
o The Leaf A-D route MUST include the PMSI Tunnel attribute with
the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel
Identifier set to the IP of the advertising AR-LEAF. The PMSI
Tunnel attribute MUST carry a downstream-assigned MPLS label or
VNI that is used by the AR-REPLICATOR to send traffic to the
AR-LEAF.
Each AR-enabled node MUST understand and process the AR type field in
the PTA (Flags field) of the routes, and MUST signal the
corresponding type (1 or 2) according to its administrative choice.
NEW TXT
When the PTA builds PMSI tunnel per RFC 6514 section I called the IR-IP changed
to PTA-ID to make it easier for the reader as the source / destination of the
PMSI tunnel termination endpoints is the PTA PMSI Tunnel Attribute Identifier.
[jorge] I don’t think we should change that since it changes the intended meaning.
**start of the new txt**
[jorge] please check out my comments and let us know what changes you would like to make based on them.
In this document, the above RT-3 and PTA can be used in two different
modes for the same BD:
- Regular-IR route: This route is the regular RT-3 I-PMSI
Originating Router's Unicast IP Address called the IR-IP MUST be set to
the PMSI Tunnel Identifier for the PTA Tunnel Type (0x06) used for IR as
described in 6514 when Ingress Replication is used. The NVE/PE that
advertises the route will set the Next-Hop to the remote tunnel endpoint
PMSI Tunnel Identifier IR-IP as defined in RFC 6514. When advertised by
an AR-LEAF node, the Regular-IR route MUST be advertised with type T=
AR-LEAF.
[jorge] I don’t agree we should reference RFC6514, since this IR route is the same route defined in RFC7432, and we refer to RFC7432.
o Tunnel Type = Assisted-Replication Tunnel. Section 11 provides
the allocated type value.
o T (AR role type) = 10 (AR-LEAF).
o L (Leaf Information Required) = 0 (for non-selective AR=0) or
(for selective AR=1). Regular-IR route is only used only for Non
Selective P-Tree.
- Replicator-AR route: This route is used by the AR-REPLICATOR to
advertise its AR capabilities, with the fields set as follows:
o Originating Router's Unicast IP Address called the AR-IP MUST be set to
the PMSI Tunnel Identifier for the PTA Tunnel Type(0x06) which is the IP
address of the PE that should be common to all the EVIs on the PE as
defined in RFC 6514. The Tunnel Identifier and Next-Hop MUST be set to
the same IP address as the Originating Router's IP address PTA Tunnel ID
when the NVE/PE originates the route as described in RFC 6514. The
Next-Hop address of the Replicator-AR route as seen on the AR-LEAF is
referred to as the AR-IP and MUST be unique and cannot be the same as the
IR-IP for a given PE/NVE.
o Tunnel Type = Assisted-Replication Tunnel. Section 11 provides
the allocated type value.
o T (AR role type) = 01 (AR-REPLICATOR).
o L (Leaf Information Required) = 1 (for non-selective AR=0) or
(for selective AR=1). Replicator-AR route is only used for Selective
P-Tree.
In addition, this document also uses the Leaf A-D route (RT-11)
defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the
selective AR mode is used. Draft ietf-bess-evpn-bum-procedure-updates
updates the EVPN BUM procedures for EVPN Multicast optimized selective trees
used, introducing three new route types RT-9 Per Region I-PMSI A-D, RT-10
S-PMSI A/D and RT-11 Leaf A-D utilized for Selective P-Tree PTA inter-as
segmentation optimizations, and utilizes RFC 7117 concept of selective tree
optimization procedure to signal leaf-ad route to instantiate inter-as
P-Tree framework from Intra-AS and Inter-AS VPLS Multicast I/S-PMSI A/D &
Leaf A-D solution which now is also leveraged by AR replicator for IR
optimization utilizing RT-11 to build selective tree IR optimization for BUM
traffic. Section 6 of bess-evpn-bum-procedure-updates defines the RT-11
Leaf-AD route selective tree optimization concept from RFC 7117 response to
I-PMSI route, RFC 7524 Inter-Area P2MP Segmented Next Hop Extended Community
S-NH-EC which is utilized for Inter-AS P2MP Segmented LSP stitching. RFC
7524 Section 6 states that it requires the ABRs to keep the next hop
unchanged for re-advertisement I/S PMSI A-D route which only needs to be
consistent for MVPN Inter-AS I-PMSI A/D routes whose next hop MUST be
unchanged. EVPN for inter-as readvertisement of I/S-PMSI A-D route the next
hop can be changed and so does not need to rely on S-NH-EC.
[jorge] if we have a reference to I-D.ietf-bess-evpn-bum-procedure-updates, I don’t think we need references to RFC7524 or 7117. This documents follows the
RT-11 use as defined in I-D.ietf-bess-evpn-bum-procedure-updates, with the specifics described in the document. Besides, as discussed, MPLS underlay is out of scope.
The Leaf A-D route MAY be used by the AR-LEAF in response to a
Replicator-AR route (with the L flag set) to advertise its desire to receive
the BM traffic from a specific AR-REPLICATOR. It is only used for selective AR
and its fields are set as follows:
o Originating Router's IP Address is set to the advertising PE's
IP address (same IP used by the AR-LEAF in regular-IR routes).
The Next-Hop address is set to the IR-IP.
o Route Key is the "Route Type Specific" NLRI of the Replicator-
AR route for which this Leaf A-D route is generated.
o The AR-LEAF constructs an IP-address-specific route-target as
indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by
placing the IP address carried in the Next-Hop field of the
received Replicator-AR route in the Global Administrator field
of the Community, with the Local Administrator field of this
Community set to 0. Note that the same IP-address-specific
import route-target is auto-configured by the AR-REPLICATOR
that sent the Replicator-AR, in order to control the acceptance
of the Leaf A-D routes.
o The Leaf A-D route MUST include the PMSI Tunnel attribute with
the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel
Identifier set to the IP of the advertising AR-LEAF. The PMSI
Tunnel attribute MUST carry a downstream-assigned MPLS label or
VNI that is used by the AR-REPLICATOR to send traffic to the
AR-LEAF.
Each AR-enabled node MUST understand and process the AR type field in
the PTA (Flags field) of the routes, and MUST signal the
corresponding type (1 or 2) according to its administrative choice.
**There are a few different flags & new flags defined in the PTA - please be
specific as to the type 1 & 2 flags**
[jorge] sure, we can do that in the next revision.
***Implementation considerations section – important and also details as to how
does the backwards compatibility work*** As RT-3 introduces a mode and RT-11 is
new in this draft what devices need to be upgraded and do all need to be
upgraded to support the solution? ***Implementation section of any vendor
implementations thus far please list** Also mention any issues found with any
implementations also any operators that have deployed the implementation.
[jorge] we can elaborate on this but the RNVE role is actually a non-upgraded node, so backwards compatibility is always present in the document. About implementations,
there are two vendors with shipping code for this afaik. There was public interop testing at the EANTC 2020, and a public white paper exists with the details of the interop testing for AR. As this document is in Last Call and seeks publication, I fail to see
how including details on interop testing between specific vendors help the document. That information will be outdated as more vendors implement it.. ?
Nits/editorial comments:
Normative reference should be added per the re-written text provided in the
Minor issues section for the following:
RFC 7524 Inter-AS P2MP Segmented LSP & RFC 7117 Multicast VPLS and draft
draft-ietf-bess-evpn-igmp-mld-proxy, RFC 6388 mLDP, RFC 6037 MDT SAFI, RFC
4875 P2MP TE
[jorge] I discussed, I don’t think we need to refer to the above due to the reasons stated.
Informative reference to MVPN procedures RFC 6513 MVPN, RFC 7988 Ingress
Replication, RFC 7348 VXLAN, RFC 8926 GENEVE
[jorge] this document does not use any of the procedures in 6513 or 7988 as discussed. It uses some pieces in RFCs 7432, 8365 and the bum-procedures draft
and those are referenced already. Also yes, we can add references to VXLAN and GENEVE, but there is no text in the document that refer to those two, so not sure why we would need references to 7348 or 8926 either?