Re: [Last-Call] [RTG-DIR] Rtgdir last call review of draft-ietf-mpls-p2mp-bfd-06

Joel Halpern <jmh@xxxxxxxxxxxxxxx> · Sat, 24 Feb 2024 23:35:59 -0500

Ib my experience (and I did charter the PIM and MPLS working groups in 
ancient history), the distinction between pt2mp for MPLS and IP 
Multicast are not about scale.  I do not think any of the defining 
documents deal with what scale they aim at.  IP Multicast includes SSM, 
which is pt-2-mpt, and ASM, which is mp-2-mp.

I actually tend to doubt that the BFD error indications will cause data 
plane congestion, given the relative rates and expected scales.  But it 
is our job to say so.  Then other folks can decide wheether we have 
sufficiently addressed the questions.

Yours,

Joel

On 2/24/2024 10:56 PM, loa@xxxxx wrote:
Greg, Joel, all

Traditionally we have distinguished between "p2mp" for MPLS, and
"multicast" for IP. An IP multicast service might easily reach a "large
number of leaves", while MPLS p2mp is more of an "transport" service
where the number of leaves are moderate.

I'm not saying that that "moderate number" might not cause the problems
Greg and Joel discusses, but it might be an idea to think a bit about
the scale. How many leaves is required to cause:

- data plane congestion?
- control plane overload?

Currently I don't see any data plane problems (correct me if I'm wrong),
while control plane overload is a possibility.

/Loa

Mostly.Â  THere is one other aspect.Â  You may consider it irrelevant, in
which case we can simply say so.Â  Can the inbound notifications coming
from a large number of leaves at the same time cause data plane
congestion?

Yours,

Joel

On 2/24/2024 8:44 PM, Greg Mirsky wrote:
Hi Joel,
thank you for your quick response. I consider two risks that may
stress the root's control plane:

   * notifications transmitted by the leaves reporting the failure of
     the p2mp LSP
   * notifications transmitted by the root to every leave closing the
     Poll sequence

As I understand it, you refer to the former as inbound congestion. The
latter - outbound. Is that correct? I agree that even the inbound
stream of notifications may overload the root's control plane. And the
outbound process further increases the probability of the congestion
in the control plane. My proposal is to apply a rate limiter to
control inbound flow of BFD Control messages punted to the control
plane.
What would you suggest in addition to the proposed text?

Best regards,
Greg

On Sat, Feb 24, 2024 at 3:28â€¯PM Joel Halpern
<jmh.direct@xxxxxxxxxxxxxxx> wrote:

     What you say makes sense.Â  I think we need to acknowledge the
     inbound congestion risk, even if we choose not to try to
     ameliorate it.Â  Your approaches seems to address the outbound
     congestion risk from the root.

     YOurs,

     Joel

     On 2/24/2024 6:25 PM, Greg Mirsky wrote:
     Hi Joel,
     thank you for the clarification. My idea is to use a rate limiter
     at the root of the p2mp LSP that may receiveÂ notifications from
     the leaves affected by the failure. I imagine that the threshold
     of the rate limiter might be exceeded and the notifications will
     be discarded. As a result, some notifications will be processed
     by the headend of the p2mp BFD session later, as the tails
     transmit notifications periodically until the receive the BFD
     Control message with the Final flag set.Â  Thus, we cannot avoid
     the congestion but mitigate the negative effect it might cause by
     extending the convergence. Does that make sense?

     Regards,
     Greg

     On Sat, Feb 24, 2024 at 2:39â€¯PM Joel Halpern
     <jmh@xxxxxxxxxxxxxxx> wrote:

         That covers part of my concern.Â  But....Â  A failure near the
         root means that a lot of leaves will see failure, and they
         will all send notifications converging on the root.Â  Those
         notifications themselves, not just the final messages, seem
         able to cause congestion.Â  I am not sure what can be done
         about it, but we aren't allowed to ignore it.

         Yours,

         Joel

         On 2/24/2024 3:34 PM, Greg Mirsky wrote:
         Hi Joel,
         thank you for your support of this work and the suggestion.
         Would the following update of the last paragraphÂ of Section
         5 help:
         OLD TEXT:
         Â  Â An ingress LSR that has received the BFD Control packet,
         as described
         Â  Â above, sends the unicast IP/UDP encapsulated BFD Control
         packet with
         Â  Â the Final (F) bit set to the egress LSR.
         NEW TEXT:
         Â  Â As described above, an ingress LSR that has received the
         BFD Control
         Â  Â packet sends the unicast IP/UDP encapsulated BFD Control
         packet with
         Â  Â the Final (F) bit set to the egress LSR.Â  In some
         scenarios, e.g.,
         Â  Â when a p2mp LSP is broken close to its root, and the
         number of egress
         Â  Â LSRs is significantly large, the control plane of the
         ingress LSR
         Â  Â might be congested by the BFD Control packets transmitted
         by egress
         Â  Â LSRs and the process of generating unicast BFD Control
         packets, as
         Â  Â noted above.Â  To mitigate that, a BFD implementation
that
         supports
         Â  Â this specification is RECOMMENDED to use a rate limiter
         of received
         Â  Â BFD Control packets passed to processing in the control
         plane of the
         Â  Â ingress LSR.

         Regards,
         Greg

         On Thu, Feb 22, 2024 at 4:10â€¯PM Joel Halpern via Datatracker
         <noreply@xxxxxxxx> wrote:

             Reviewer: Joel Halpern
             Review result: Ready

             Hello,

             I have been selected as the Routing Directorate reviewer
             for this draft. The
             Routing Directorate seeks to review all routing or
             routing-related drafts as
             they pass through IETF last call and IESG review, and
             sometimes on special
             request. The purpose of the review is to provide
             assistance to the Routing ADs.
             For more information about the Routing Directorate,
             please see
             https://wiki.ietf.org/en/group/rtg/RtgDir

             Although these comments are primarily for the use of the
             Routing ADs, it would
             be helpful if you could consider them along with any
             other IETF Last Call
             comments that you receive, and strive to resolve them
             through discussion or by
             updating the draft.

             Document: draft-name-version
             Reviewer: your-name
             Review Date: date
             IETF LC End Date: date-if-known
             Intended Status: copy-from-I-D

             Summary:Â  This document is ready for publication as a
             Proposed Standard.
             Â  Â  I do have one question that I would appreciate being
             considered.

             Comments:
             Â  Â  The document is clear and readable, with careful
             references for those
             Â  Â  needing additional details.

             Major Issues: None

             Minor Issues:
             Â  Â  I note that the security considerations (section 6)
             does refer to
             Â  Â  congestion issues caused by excessive transmission
             of BFD requests.Â  Â I
             Â  Â  wonder if section 5 ("Operation of Multipoint BFD
             with Active Tail over
             Â  Â  P2MP MPLS LSP") should include a discussion of the
             congestion implications
             Â  Â  of multiple tails sending notifications at the rate
             of 1 per second to the
             Â  Â  head end, particularly if the failure is near the
             head end.Â  While I
             Â  Â  suspect that the 1 / second rate is low enough for
             this to be safe,
             Â  Â  discussion in the document would be helpful.

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call