Hi John, please see inline. > Hi Valery, > > Thanks for your review. Some responses inline below. > > > On Jul 4, 2023, at 4:06 AM, Valery Smyslov via Datatracker <noreply@xxxxxxxx> wrote: > > > > Reviewer: Valery Smyslov > > Review result: Has Issues > > > > I have reviewed this document as part of the security directorate's > > ongoing effort to review all IETF documents being processed by the > > IESG. These comments were written primarily for the benefit of the > > security area directors. Document editors and WG chairs should treat > > these comments just like any other last call comments. > > > > The document defines a new BGP capability "Long-lived Graceful Restart Capability" > > that allows stale routes to be retained for a longer time than is currently allowed > > by RFC 4724. The document is well written and is easy to understand. > > Thank you! You welcome :-) > > My concern is that the upper limit for the "Long-lived Stale Time" period is 2^24 - 1 seconds > > (about 194 days) and the document doesn't specify any restrictions for this value. > > I’m not sure if this is different from what you meant by “any restrictions”, but Section 4.2 has "These > timers MAY be modified by local configuration.” After discussing it with my co-authors, we agree that > this is too easy to overlook, and propose to change it to "The timers received in the Long-lived Graceful > Restart Capability SHOULD be modifiable by local configuration, which may impose either an upper or a > lower bound, or both, on their respective values.” Then, we return to this in our updated Security > Considerations section, read on. OK. > > It seems to me that having such long lived stale routes may open new possibilities for attackers. > > In particular, a possibility of a resource exhausting for storing a lot of stale routes > > for a very long time leading to a DoS attack come first to my mind. > > This possibility is not mentioned in the Security Considerations. > > We worked through several scenarios and as best we can determine, this is adequately covered under > "The security implications of the LLGR mechanism defined in this document are akin to those incurred by > the maintenance of stale routing information within a network." The outline looks like: > > 1. To successfully mount a DoS attack against the network, the attacker has to be able to inject a large > number of routes. If an attacker can do that, it’s a pre-existing vulnerability, not one created by LLGR. > 2. The new vulnerability would be, if the DoS in (1) can be exacerbated by keeping the garbage routes > stored in the network even after the attack against the proximate victim has been remediated. > 2.a. But, if the attack is remediated, for instance by resetting the BGP session from the attacker to the > victim (either manually, or as a result of the operation of an automatic defense feature such as max- > prefixes), then the routes would promptly be flushed from the network as a consequence of the normal > operation of the BGP protocol. > 2.b. So, in order for the attack to succeed, the proximate victim would have to be prevented from > withdrawing the routes. Ergo, the attacker would have to have the ability to not only inject routes in (1), > but subsequently to silence the victim router (e.g. by crashing it into a non-recoverable state). > 2.c. Even if that scenario were to be carried out (which implies underlying vulnerabilities probably more > concerning than the LLGR resource-exhaustion vulnerability itself) the victim router’s next hop would > disappear from the IGP, which would cause the LLGR routes to become non-resolvable, removing them > from the FIB. Granted that RIB resources would still be consumed for the duration of the attack or the > LLST, whichever is shorter, but in general FIB, not RIB, resources are the bottleneck. I was mostly thinking on something like 2.b. You are in a better position to analyze this scenario, so if you think that it is not a real threat, then I trust you. > We’re not absolutely opposed to including an analysis like the above in the Security Considerations, but > pending any further discussion, we’re comfortable with leaving it at the brief outline that’s already > present. We did add one sentence to the introductory paragraph, so > > OLD: > The security implications of the LLGR mechanism defined in this document are akin to those incurred by > the maintenance of stale routing information within a network. > > NEW: > The security implications of the LLGR mechanism defined in this document are akin to those incurred by > the maintenance of stale routing information within a network. However, since the retention time may > potentially be much longer, the window during which certain attacks are feasible may be substantially > increased. Fine with me, thank you. > > Then, it seems to me that the countermeasures suggested in Section 6 to avoid VPN breach > > may not work for large values of the "Long-lived Stale Time" period. > > > > And a final nit: the last para of Section 6 looks to me like some sort of excuse, which > > in my opinion is not appropriate for a technical document. No matter how complex an attack is, > > if it is ever feasible with the given threat model, then we should just describe it > > with no additional sentiments that it is hard. Perhaps it is better to describe possible > > attacks in terms of attacker's capabilities. E.g.: "If an attacker is able to inject packets > > into the network then the following attacks are possible...". > > Thanks for challenging us on these! Happily, the rewrite to fix the latter also led to improving the clarity > of exposition regarding the countermeasure. Your point is still correct of course, that if it’s impossible to > find a viable configuration that prevents overlap of label allocation reuse time and LLST, then the attack > can’t be entirely ruled out; I hope the proposed text is sufficiently clear on this point. I’ve pasted the > proposed update below. > > OLD: > Therefore, to avoid VPN breach, before enabling BGP LLGR for a VPN > address family, Service Providers need to check how fast a given > label can be reused by a PE, taking into account: > > * The load of the BGP route churn on a PE (in terms of the number of > VPN labels advertised and the churn rate). > > * The label allocation policy on the PE (possibly depending upon the > size of the pool of the VPN labels (which can be restricted by > hardware considerations or other MPLS usages), the label > allocation scheme (for example per route or per VRF/CE), the re- > allocation policy (for example least recently used label). > > Note that [RFC4781] which defines Graceful Restart Mechanism for BGP > with MPLS is also applicable to BGP LLGR. > > These considerations notwithstanding, the LLGR mechanism described > within this document is considered to be complex to exploit > maliciously - in order to inject packets into a topology, there is a > requirement to engineer a specific LLGR state between two PE devices, > whilst engineering label reallocation to occur in a manner that > results in the two topologies overlapping. Such allocation is > particularly difficult to engineer (since it is typically an internal > mechanism of a router). > > NEW: > In order to exploit the vulnerability described above, there is a > requirement to engineer a specific LLGR state between two PE devices, > whilst engineering label reallocation to occur in a manner that > results in the two topologies overlapping. Therefore, to avoid the > potential for a VPN breach, before enabling BGP LLGR for a VPN > address family, the operator should endeavor to ensure that the lower > bound on when a label might be reused is greater than the upper bound > on LLST. Section 4.2 discusses the provision of an upper bound on LLST. > Details of features for setting a lower bound on label reuse time are > beyond the scope of this document; however, factors that might need > to be taken into account when setting this value include: > > * The load of the BGP route churn on a PE (in terms of the number of > VPN labels advertised and the churn rate). > > * The label allocation policy on the PE (possibly depending upon the > size of the pool of the VPN labels (which can be restricted by > hardware considerations or other MPLS usages), the label > allocation scheme (for example per route or per VRF/CE), the re- > allocation policy (for example least recently used label). > > Note that [RFC4781] which defines Graceful Restart Mechanism for BGP > with MPLS is also applicable to BGP LLGR. Thank you, this text is much better. > We’ll post a version 06 with the updates as soon as possible. Thanks again for your review. No problem :-) Regards, Valery. > —John -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call