Hello Juliusz
Please see below
cheers,
Pascal
an applicability statement of the various possibilities would be useful in the future. Could be a paper or an RFC. AT least it would make sense to have an applicability section here. For instance, IOT may experience large and asymetric delaysSection 1 describes the conditions in which we know the protocol to be applicable: cross-continent overlay networks. At this time, we are not proposing this protocol for IoT-style applications, which have completely different time-scales than cross-continent overlay networks. We encourage people interested in IoT to borrow from our ideas, we'll take it as a compliment.
Right; I'm not sure the conditions are that clear though. I was
looking for words that the latency has to be stable and
measurable: limited jitter/latency ratio - which is not
necessarily the case of IoT- and values order(s) of magnitude
above than the timing precision.
Note that for the given example speed of light will certainly have measurable effects. But going to Orleans and back may be hidden inside e.g., wireless delays.Yes, Orléans is at 500µs from Paris, way below the recommended value of rtt-min (Section 4.2), and will therefore be classified as a local link.
Meaning that the algo is not meant to make that difference. This
is again an interesting thing for the applicability (applies
where the $ cost is not critical)
I'm effectively concerned with the effect of buffer bloats which could create oscillations exactly like early ARPNET load-based metric.So are we, and that's what the whole of Section 4 is about.
I had the knee jerk reaction because the text there did not say
that. Maybe a sentence there would solve my reader panic
The bit that you quoted explicitly references Section 3.4.2 of RFC 8966.Are you suggesting that we need to repeat the contents of RFC 8966 here? Please clarify.
just one sentence saying what IHU is saves the novice reader
going to the main RFC. In this case though, I expect the expected
reader knows perfectly what IHUs are, so maybe just expanding IHU
does the trick.
Ref IEEE 1588? there are many profiles for it; maybe this work could show as one.I'm not sure what you're suggesting exactly. Please clarify.
You are defining an operation that seems to inherit from 1588. This inheritance is typically expressed as a profile. There are things that come with the profile and are to be explicit to make the story complete. But maybe it's overkill. I leave to your discretion to look at it and mention it or not.
Important to indicate which time stamps are used (eg where in the stack is t1measured). Do we measure the latency inside the sender meaning that the time stamp is that of the software above, or do we measure stating at MAX enqueue, or starting at PHY XMIT?The implementation note in Section 3.4 recommends timestamping just before the call to sendmsg. I'll see if I can add some normative language to this effect.
please add words with reference to 3.4 and explain that the
latency below the API is dependent on the link load but not the
system load
For short distance / high precision as claimed in the introduction,There is no such claim in the introduction. The paragraph that confused you was only meant to point at potentially interesting further reasearch, I'll remove it.
"In this document, we specify an extension to the Babel routing protocol that enables precise measurement of the round-trip time (RTT) of a link, and allows its usage in metric computation. Since this causes a negative feedback loop, special care is needed to ensure that the resulting network is reasonably stable (Section 4).
"
maybe the term precise there is the trouble; also maybe the
"stable" word should qualify the link loads and their respective
latency as opposed to vaguely the network
In principle, this algorithm is inaccurate in the presence of clock drift (i.e., when A's and B's clocks are running at different frequencies). However, t2' - t1' is usually on the order of seconds, and significant clock drift is unlikely to happen at that time scale.back to applicability of the work. I believe some expectations on the clock drift vs RTT can be made for modern hardware. Nodes have an idea of which clock they use and what drift they have. The draft could recommend that the clocking error be 2 orders of magnitude less than the RTTs that the protocol measures, else the measurement cannot be trusted.With the default parameters used by Babel, the time between Hello and IHU is 2s on average. A cheap crystal oscillator, such as used in consumer electronics, has a typical drift of 10ppm (30ppm worst case), leading to an error of 20µs (60µs worst case). I also don't see where the "two orders of magnitude" figure comes from. The goal of this protocol is to disambiguate between local and distant routes, not to accurately determine the physical properties of links.
"two orders of magnitude" means that the error is in the percent range or below, meaning that they can be ignored for your purpose. For the crystals I agree there's little issue, little point having the text above. The troubling error does not come from the drift between message and response, but from the local system loads that are added to the real transfer latency, like how long an incoming message will be queued before processing, or, if 2 routes are over the same Wi-Fi adapter, how the WI-Fi MAC-PHY queues will impact differential the 2 measurements. Are both errors also below in the % range or below? Probably with RTT min.
Back to my earlier question of which step in the stack is relevant for this measurement. Surelly any step that is dependent on the load of this system (variable but independent of the link being used) as opposed to the load to the transmission should be omitted.So if a router is loaded, we'll get an extra 1ms jitter. This is not likely to impact route selection, and even if it does, it will merely cause the protocol to route around overloaded routers.
These are the words I was looking for. That jitter will be
ignored because (hysteresis, RTT min, smoothing, etc...)
Second, using the RTT signal for route selection gives rise to a negative feedback loop: when a route has a low RTT, it is deemed to be more desirable, which causes it to be used for more data traffic, which may lead to congestion, which in turn increases the RTT. Without some form of hysteresis, using RTT for route selection would lead to oscillations between parallel routes, which might lead to packet reordering and negatively affect upper-layer protocols (such as TCP). I believe this discussion should be seen earlier in the text, eg in the introduction (not the solution but at least that the issue exists and is addressed in the protocol). See my early comment on ARPANET.I most respectfully disagree. This document is structured in two parts, a first part that defines a subprotocol that produces a continuous stream of RTT samples, and a second part that describes an algorithm to extract from that stream information that is useful for route selection.
OK with the subprotocol thing. My ask is really that the reader
knows early that you handle the oscillations and reasonable jitter
(latency variations).
4.3. Hysteresis Even after applying a bounded mapping from smoothed RTT to a cost value, the cost may fluctuate when a link's RTT is between rtt-min and rtt-max. This is effectively mitigated by using a robust hysteresis algorithm, such as the one described in Appendix A.3 of [RFC8966]. if this is what solves the oscillation issue please mention it,No, it's more complex than that. There are three disctinct mechanisms that collaborate to avoid oscilliations. The smoothing in Section 4.1 avoids oscillations due to outliers. The non-linear mapping from RTT to cost described in Section 4.2 avoids oscillations for good links (below rtt-min) and for bad links (above rtt-max). Hysteresis is a last-resort mechanism that mitigates the issue for links between rtt-min and rtt-max. I've just re-read Section 4, and I think it's clear enough. Please let me know if you have suggestions to make it better.
Do not change section 4 for me, but maybe the text above is
exactly the fwd ref I'm looking for early in the text
Maybe discuss the consequences of a MIM that modifies the values eg to discourage Paris to Paris and cause routing via Tokyo?If you're not using cryptographic signatures, then a MITM has easier ways to redirect traffic. See Section 6 of RFC 8966.
Great, please add that to the security section.
Many thanks!
Pascal
-- Juliusz
-- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call