Tsvart early review of draft-ietf-core-cocoa-02

Wesley Eddy <wes@xxxxxxxxxxxxxxx> · Mon, 08 Jan 2018 14:12:36 -0800

Reviewer: Wesley Eddy
Review result: Not Ready

The terminology "RTO estimate" used throughout the document is confusing to me.
 The RTO is a solid value, not estimated, and is computed from estimates of the
RTT and RTT variation.  You could talk about estimating the "optimal" RTO value
(for some definition of optimal), but I don't think that's the case here. 
Similarly section 4.2 is titled "Measured RTO Estimate", but RTO is not a
measured quantity (it is always computed).  I think this terminology needs to
be corrected throughout the document.

Section 3 seems important to me, but doesn't say very clearly what it means by
"generally applicable".  Does that mean that it could run across the Internet? 
Does it work if there are very short or very long delays, or only ones around
the values mentioned in Apppendix C?  Does it work if the links are very thin
bandwidth?  Is it efficient when there is very high bandwidth (e.g. Gbps
range)?  Since there are many classes of IoT device and many possible use
cases, it seems important to me to be a little bit more clear about the
envisioned use cases, or at least the specific ones that have been explored
to-date, versus what hasn't been explicitly considered but might (or might not)
also work.  The appendix just sort of uses the word "diverse" and mentions a
couple link technologies, but otherwise doesn't provide any enlightenment.

The first sentence in section 4 doesn't make much sense to me, since the
default timeout doesn't imply any knowledge of the RTT.  Do you mean to say
that a more appropriate RTO can be computed once some RTT samples are
available?  The wording could be clarified here.

The description in the beginning of section 4.2 says that ambiguous samples
resulting from retransmissions are used in the "weak" estimator, and seems to
be saying that Karn's algorithm is not used for filtering samples?  The
rationale seems to be in 4.2.2, but the text there is vague.  In general, it
would seem to result only in a potentially slower than necessary timeout, but
still faster than the default.  That seems inherently safe, and I'd think there
could be a stronger argument made than the current text.

That said, the statement in this section that the rate of retries is reduced
does not make sense, since any time the RTO decreases, the rate of retries
should be increasing, with all other things considered equal?

Is there sensitivity to the weights for the EWMA?  This has been studied a bit
for TCP, but I guess may be different for CoAP scenarios since there are less
samples typically, or something?

Why is this being targeted for just Informational rather than Experimental or
better?  It's mentioned as being informational in both the header and Section
1.1, but I didn't notice an explanation of why the WG thinks it wouldn't be a
candidate for widespread use, etc.  Is there a concern that needs to be
described?