Re: [Last-Call] Secdir last call review of draft-ietf-idr-long-lived-gr-05

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Valery,

Thanks for your review. Some responses inline below.

> On Jul 4, 2023, at 4:06 AM, Valery Smyslov via Datatracker <noreply@xxxxxxxx> wrote:
> 
> Reviewer: Valery Smyslov
> Review result: Has Issues
> 
> I have reviewed this document as part of the security directorate's
> ongoing effort to review all IETF documents being processed by the
> IESG.  These comments were written primarily for the benefit of the
> security area directors.  Document editors and WG chairs should treat
> these comments just like any other last call comments.
> 
> The document defines a new BGP capability "Long-lived Graceful Restart Capability"
> that allows stale routes to be retained for a longer time than is currently allowed
> by RFC 4724. The document is well written and is easy to understand.

Thank you!

> My concern is that the upper limit for the "Long-lived Stale Time" period is 2^24 - 1 seconds
> (about 194 days) and the document doesn't specify any restrictions for this value.

I’m not sure if this is different from what you meant by “any restrictions”, but Section 4.2 has "These timers MAY be modified by local configuration.” After discussing it with my co-authors, we agree that this is too easy to overlook, and propose to change it to "The timers received in the Long-lived Graceful Restart Capability SHOULD be modifiable by local configuration, which may impose either an upper or a lower bound, or both, on their respective values.” Then, we return to this in our updated Security Considerations section, read on.

> It seems to me that having such long lived stale routes may open new possibilities for attackers.
> In particular, a possibility of a resource exhausting for storing a lot of stale routes
> for a very long time leading to a DoS attack come first to my mind.
> This possibility is not mentioned in the Security Considerations.

We worked through several scenarios and as best we can determine, this is adequately covered under "The security implications of the LLGR mechanism defined in this document are akin to those incurred by the maintenance of stale routing information within a network." The outline looks like:

1. To successfully mount a DoS attack against the network, the attacker has to be able to inject a large number of routes. If an attacker can do that, it’s a pre-existing vulnerability, not one created by LLGR.
2. The new vulnerability would be, if the DoS in (1) can be exacerbated by keeping the garbage routes stored in the network even after the attack against the proximate victim has been remediated. 
  2.a. But, if the attack is remediated, for instance by resetting the BGP session from the attacker to the victim (either manually, or as a result of the operation of an automatic defense feature such as max-prefixes), then the routes would promptly be flushed from the network as a consequence of the normal operation of the BGP protocol.
  2.b. So, in order for the attack to succeed, the proximate victim would have to be prevented from withdrawing the routes. Ergo, the attacker would have to have the ability to not only inject routes in (1), but subsequently to silence the victim router (e.g. by crashing it into a non-recoverable state). 
  2.c. Even if that scenario were to be carried out (which implies underlying vulnerabilities probably more concerning than the LLGR resource-exhaustion vulnerability itself) the victim router’s next hop would disappear from the IGP, which would cause the LLGR routes to become non-resolvable, removing them from the FIB. Granted that RIB resources would still be consumed for the duration of the attack or the LLST, whichever is shorter, but in general FIB, not RIB, resources are the bottleneck.

We’re not absolutely opposed to including an analysis like the above in the Security Considerations, but pending any further discussion, we’re comfortable with leaving it at the brief outline that’s already present. We did add one sentence to the introductory paragraph, so

OLD:
The security implications of the LLGR mechanism defined in this document are akin to those incurred by the maintenance of stale routing information within a network.

NEW:
The security implications of the LLGR mechanism defined in this document are akin to those incurred by the maintenance of stale routing information within a network. However, since the retention time may potentially be much longer, the window during which certain attacks are feasible may be substantially increased.   

> Then, it seems to me that the countermeasures suggested in Section 6 to avoid VPN breach
> may not work for large values of the "Long-lived Stale Time" period.
> 
> And a final nit: the last para of Section 6 looks to me like some sort of excuse, which
> in my opinion is not appropriate for a technical document. No matter how complex an attack is,
> if it is ever feasible with the given threat model, then we should just describe it
> with no additional sentiments that it is hard. Perhaps it is better to describe possible
> attacks in terms of attacker's capabilities. E.g.: "If an attacker is able to inject packets
> into the network then the following attacks are possible...".

Thanks for challenging us on these! Happily, the rewrite to fix the latter also led to improving the clarity of exposition regarding the countermeasure. Your point is still correct of course, that if it’s impossible to find a viable configuration that prevents overlap of label allocation reuse time and LLST, then the attack can’t be entirely ruled out; I hope the proposed text is sufficiently clear on this point. I’ve pasted the proposed update below.

OLD:
   Therefore, to avoid VPN breach, before enabling BGP LLGR for a VPN
   address family, Service Providers need to check how fast a given
   label can be reused by a PE, taking into account:

   *  The load of the BGP route churn on a PE (in terms of the number of
      VPN labels advertised and the churn rate).

   *  The label allocation policy on the PE (possibly depending upon the
      size of the pool of the VPN labels (which can be restricted by
      hardware considerations or other MPLS usages), the label
      allocation scheme (for example per route or per VRF/CE), the re-
      allocation policy (for example least recently used label).

   Note that [RFC4781] which defines Graceful Restart Mechanism for BGP
   with MPLS is also applicable to BGP LLGR.

   These considerations notwithstanding, the LLGR mechanism described
   within this document is considered to be complex to exploit
   maliciously - in order to inject packets into a topology, there is a
   requirement to engineer a specific LLGR state between two PE devices,
   whilst engineering label reallocation to occur in a manner that
   results in the two topologies overlapping.  Such allocation is
   particularly difficult to engineer (since it is typically an internal
   mechanism of a router).

NEW:
   In order to exploit the vulnerability described above, there is a
   requirement to engineer a specific LLGR state between two PE devices,
   whilst engineering label reallocation to occur in a manner that
   results in the two topologies overlapping.  Therefore, to avoid the
   potential for a VPN breach, before enabling BGP LLGR for a VPN
   address family, the operator should endeavor to ensure that the lower
   bound on when a label might be reused is greater than the upper bound
   on LLST.  Section 4.2 discusses the provision of an upper bound on LLST.
   Details of features for setting a lower bound on label reuse time are
   beyond the scope of this document; however, factors that might need
   to be taken into account when setting this value include:

   *  The load of the BGP route churn on a PE (in terms of the number of
      VPN labels advertised and the churn rate).

   *  The label allocation policy on the PE (possibly depending upon the
      size of the pool of the VPN labels (which can be restricted by
      hardware considerations or other MPLS usages), the label
      allocation scheme (for example per route or per VRF/CE), the re-
      allocation policy (for example least recently used label).

   Note that [RFC4781] which defines Graceful Restart Mechanism for BGP
   with MPLS is also applicable to BGP LLGR.

We’ll post a version 06 with the updates as soon as possible. Thanks again for your review.

—John 

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux