On Wed, Sep 25, 2013 at 12:38 PM, George, Wes <wesley.george@xxxxxxxxxxx> wrote: >> From: christopher.morrow@xxxxxxxxx [mailto:christopher.morrow@xxxxxxxxx] >> >> [CLM] >> In the RPKIcache example, 'consumer' is 'routers in your network'. >> 'Close' is 'close enough that bootstrapping isn't a problem', balanced >> with 'gosh, maybe I don't want to put one on top of each router! plus >> associated management headaches to deal with these new >> systems/appliances'. > > [WEG] that's part of my issue - the only way that you get "close enough that > bootstrapping isn't a problem" is when the cache and router are directly there's some baseline that's acceptable, you intimate that IGP comes up before EGP below. that makes some sense, and thus maybe the target is 'in your igp, close enough that fiber failures won't be a problem' then? > connected. Otherwise there *is* going to be some amount of time while > the router is coming up that it can't talk to its configured caches e.g. while but the data in the cache only REALLY matters for bgp validation... so your IGP clue below isn't unreasonable. > it learns the route(s) to the cache(s). I think that supports a recommendation > to put the caches in your IGP instead of BGP, so that you get faster I actually didn't note a [ie]GP recommendation in the doc. > convergence of those routes and therefore have access to the cache > when BGP comes up and starts converging, rather than once BGP is > partially converged. But the draft doesn't say that. ok > The question is, does the propagation/convergence delay for an IGP in an > average network (let's call it somewhere between subsecond and 5 seconds) > make a non-trival difference in RPKI's bootstrap behavior, especially since > BGP convergence is also dependent on IGP convergence? Can we make a > clearer recommendation of the performance envelope we're shooting for so > that people can design accordingly? I'm not sure I buy a general "faster(or > closer) is always better" recommendation - at some point, we hit diminishing > returns, given that this is mostly a human time-scale system. The document > doesn't provide clear guidance on how to balance that tradeoff. i think a bunch of this really also depends on the operator deploying though... 'its hard to get server people to do X for me' or 'gosh, these appliances can be managed by network-operations! and they are cheap-ish' or 'gosh, we don't have 1gbps ports anymore in general, crap...' I do think the original intent was to not dictate: "Must be 5ms from the router, or else!!" and rely upon the operator to do the tradeoff you just made above. Each network is different in it's expectations from the infra, and each has different igp/egp designs as well as fiber plant restrictions. I think it's going to be rough going making a recommendation much more than: 1) make sure the cache is available before BGP starts to converge for a device and I actually can't come up with something else that's super helpful :( even the above might be 'too much advice', if your plan is to accept all routes and simply de-pref until validation might happen then re-evaluate as you can. >> [CLM] >> I guess one way is to say: "People should understand the dependencies >> and engineer appropriately" ... which you kind of asked to not say in >> the original comment. (or is the issue that the dependencies aren't >> clear?) > > [WEG] The issue is that the dependencies aren't clear. I'm not expecting the > text to be too prescriptive here, because all networks are different, but I need > enough technical discussion to properly "understand the dependencies and > engineer accordingly". This is an operational considerations document, so it > needs to tell operators what breaks if they don't do it as recommended. If this ok... > is about bootstrapping, then we need to be clearer about the relationship > between bootstrapping and network convergence (since recommending > that the cache is directly connected to the router is impractical) and how > it impacts RPKI cache-router communication and performance. If it's about > reducing latency via proximity, then we need to explain how much latency is > too much latency and why. If it's about proper geographic diversity within a > network's topology, then we need to say that. If we don't actually know if it > makes a difference, and so are defaulting to recommendations that most folks > agree are generally a good idea, we should say that. But right now we're > assuming too much, IMO. ok, the current text is: " As RPKI-based origin validation relies on the availability of RPKI data, operators SHOULD locate caches close to routers that require these data and services. 'Close' is, of course, complex. One should consider trust boundaries, routing bootstrap reachability, latency, etc" Maybe something like: " As RPKI-based origin validation relies on the availability of RPKI data, operators SHOULD locate caches close enough to routers that require these data and services such that failures in local device routing domain do not impact cache availability. One should consider trust boundaries, routing bootstrap reachability, latency, etc" -chris (content warning removed.. since it didn't come from TWC, and my words are not as restricted)