Re: [Last-Call] Genart last call review of draft-ietf-tcpm-rto-consider-14

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 5 Jun 2020, at 17:43, Mark Allman <mallman@xxxxxxxx> wrote:
> 
> 
> Hi Stewart!
> 
> Thanks for the feedback.  Sorry for the long RTT.  I had a recent
> deadline and am now trying to dig out.
> 
>> Major issues:
>> 
>> As far as I can see this text only applies to exchanges between
>> applications and network support applications such as
>> DNS. I.e. this is targeted at layer 4 and above. Given the
>> religious nature of BCPs in the eyes of some reviewers, and to
>> prevent endless explanations by those that design routing
>> protocols, OAM and other lower layer sub-system I think there
>> needs to a scoping text in block capitals at the at the very start
>> of the documnet.
> 
> I am not entirely sure what you're suggesting here.  Per note to
> Tom, I am going to add a few words to the intro.  Maybe that will
> help.  I think it's unlikely I'll use block capitals! :-)


In the text the discussion and examples and base learning seem to derive
>From the transport layer.

RTT issues apply to other aspects of the Internet, and you either need to
analyse and discuss them in the same depth and apply your conclusions accordingly
Or you need to explicitly exclude them

What I am hoping we can do is to prevent a single focus review of the 
Internet operations giving rise to endless discussions and argument when
Legitimate designs are proposed for other aspects.

The antithesis of the big I internet is a service provider domain where
Different considerations often apply.

> 
>> =========
>> 
>>      - The requirements in this document may not be appropriate in all
>>        cases and, therefore, inconsistent deviations may be necessary
>>        (hence the "SHOULD" in the last bullet).  However,
>>        inconsistencies MUST be (a) explained and (b) gather consensus.
>> 
>> SB> That can be quite an onerous obligation  and provide scope for
>> SB> endless argument when reviewers are not domain experts in the
>> SB> protocol being designed.
> 
> This was added because another reviewer thought it was for sure
> necessary.
> 
> I guess I don't understand why you'd call this 'an onerous
> obligation' since presumably you'd do it anyway without this
> document.  

Not necessarily. There is a lot of experience in how to run the layers
Below transport.


> Are we ramming things through without consensus?

Well if it impacts Int and Routing and there has not been widespread 
Review there the answer is yes. You are getting pushback from a
GenARt reviewer that happens to be a specialist in the lower layers
So arguably the process is working, but wider review is always better,

>  If not
> (my assumption), (b) is no sweat.  Are we ramming things through
> without thought?  If not (my assumption), (a) is straightforward and
> hopefully is being done anyway.  In other words, I don't understand
> the complaint here because if you don't want to use the guidelines
> then that is fine, but in going through the standard process to
> define a loss detector you'll end up meeting this bullet.  Even if
> this document doesn't get published or didn't exist our documents
> should still be meeting this bullet.

Yes, but having seen far too many religious standoffs between the areas
Over the years, it would be nice not to create the basis for more such
Events.

> 
>> =======
>> 
>>          While there are a bevy of uses for timers in protocols---from
>>          rate-based pacing to connection failure detection and
>>          beyond---these are outside the scope of this document.
>> 
>> SB> I am not sure what that means for the applicability of this
>> SB> document.
> 
> This was added at some point along the way because someone thought
> something like rate-based pacing could be covered by the guidelines
> and the intent is to say it is not.  I have zero love for this bit
> and would happily remove it, but am loathe to do so because the old
> comment will then come back.

Perhaps a more precise definition is required. It is rather general.

> 
>> =========
>> 
>>    (1) As we note above, loss detection happens when a sender does not
>>        receive delivery confirmation within an some expected period of
>>        time.  In the absence of any knowledge about the latency of a
>>        path, the initial RTO MUST be conservatively set to no less than
>>        1 second.
>> 
>> SB> This issue may be addressed by the scoping text, but 1s is no
>> SB> use when you are trying to detect sub 50ms of packet loss in
>> SB> the infrastructure.
> 
> We have to start somewhere when we know nothing.
> 
> I think in my thread with Tom we hit upon this notion that the
> document is really about sort of arbitrary, unknown and therefore
> presumed unreliable networks.  I am going to add some words to this
> effect.  Does this help?
> 
> Again, for specific environments where things are more nailed down
> and known, deviations are fine and explicitly OK.  But, as a general
> default I think saying "when you don't know anything < 50msec is
> cool" is unlikely to be appropriate.  Well, no, I think it would be
> quite inappropriate, actually.


As you may gather my concern is that by making this a catch all position
You catch too much and put a burden of work on too many other groups.
 
If you are talking about L4 and above in the big I Internet I am not concerned, 
But as written it has a much larger scope and that concerns me.

> 
>> =============
>> 
>>    (3) Each time the RTO is used to detect a loss, the value of the RTO
>>        MUST be exponentially backed off such that the next firing
>>        requires a longer interval.  The backoff SHOULD be removed after
>>        either (a) the subsequent successful transmission of
>>        non-retransmitted data, or (b) an RTO passes without detecting
>>        additional losses.  The former will generally be quicker.  The
>>        latter covers cases where loss is detected, but not repaired.
>> 
>>        A maximum value MAY be placed on the RTO.  The maximum RTO MUST
>>        NOT be less than 60 seconds (as specified in [RFC6298]).
>> 
>>        This ensures network safety.
>> 
>> SB> This does not work in OAM applications.
> 
> Well, OK, get consensus to do something different---which is
> completely fine.  I think retransmission timers have shown
> themselves to be crucial for preventing collapse and, again, as a
> default I think this is our best advice.

No, I think you need to show that you have discussed this with other
Groups before imposing it on them.

> 
>> Minor issues:
>> 
>> "By waiting long enough that we are unambiguously
>>  certain a packet has been lost we cannot repair losses in a timely
>>  manner and we risk prolonging network congestion."
>> 
>> I have a concern here that the emphasis is on classical
>> operation. We are beginning to see application to run over the
>> network where the timely delivery of a packet is critical for
>> correct operation of even SoL. As a BCP the text needs to
>> recognise that the scope and purpose of IP is changing and that
>> classical learning and rules derived from them may not apply.
>> 
>> Also if not ruled out of scope earlier we need to be clear at this
>> point that things like BFD have different considerations.
> 
> I am going to suggest we revisit this after I hack out a little
> extra text for the intro.  You can see if that helps.

OK, let’s see the new text.

> 
>> ==========
>> 
>>      "- This document does not update or obsolete any existing RFC.
>>        These previous specifications---while generally consistent with
>>        the requirements in this document---reflect community consensus
>>        and this document does not change that consensus."
>> 
>> I think it needs to be clear that adherence to this RFC is not
>> required for minor updates and extensions to existing RFCs. Having
>> seen minor routing extension held up by security concerns related
>> to underlying protocols rather than the extension itself there is
>> a lot of sensitivity on this point in some quarters of the IETF.
> 
> Um.  Do you have suggested words?  I am not much of a protocol
> lawyers (thankfully!), but I am not really conjuring the case you're
> concerned about.  Something like ...
> 
>  (1) RFC XXXX was published 10 years ago and violates
>      rto-consider.
>  (2) We want to do a XXXXbis.
>  (3) The bis has to then explain why it's cool to violate
>      rto-consider.
> 
> ... ?
> 
> I would say if XXXX has a loss detector that had consensus and has
> been in use for a while it'd be pretty easy to get consensus for
> XXXXbis that we can still use it as it has worked fine.

I have seen a number of cases where minor changes to protocols became 
Hostage to the ambitions of others to make a fundamental change 
And I hope we can avoid this here.

> 
>> It might be useful to make it clear that there are some
>> applications that would prefer no data to late data.
> 
> This document is about loss detection, not what one does after
> detecting.  So, we do say ...
> 
>    However, as discussed above, the detected loss need not be
>    repaired
> 
> I am happy to re-enforce this point.  Text suggestions welcome.

OK

> 
>> Nits/editorial comments:
>> 
>> The terminology section confuses ID-nits - I think it should be a
>> section in its own right later in the document.
> 
> Yeah- id-nits as it is run when submitting doesn't flag this.  It
> was flagged by someone else in LC.  Because I am old school it's
> hard to renumber everything and so I was just leaving this for the
> rfc-ed to do something reasonable here.

I run ID-nits in verbose mode before submission. This partially
A matter of respect for reviewers and the editors, and partly because I
Figure that the more errors you deal with the easier it is to find the 
Important errors that may mask.


> 
>> The following nits issues need looking at
>> 
>>  == Missing Reference: 'RFC5681' is mentioned on line 377, but not defined
>> 
>>  == Unused Reference: 'RFC3940' is defined on line 515, but no explicit
>>     reference was found in the text
>> 
>>  == Unused Reference: 'RFC4340' is defined on line 519, but no explicit
>>     reference was found in the text
>> 
>>  == Unused Reference: 'RFC6582' is defined on line 540, but no explicit
>>     reference was found in the text
> 
> I will fix all these.  Again, I was trusting the id-nits when I
> submitted and these were not flagged (or, if they were it wasn't in
> a way that foisted them on my screen).  But, they're easy fixes, so
> thanks!
> 
> allman

Thanks

Stewart

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux