Re: [Last-Call] [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmfor TCP) to Proposed Standard

Markku Kojo <kojo=40cs.helsinki.fi@xxxxxxxxxxxxxx> · Sat, 19 Dec 2020 02:41:31 +0200 (EET)

Hi Neal,

On Fri, 18 Dec 2020, Neal Cardwell wrote:

On Wed, Dec 16, 2020 at 2:39 PM Markku Kojo <kojo@xxxxxxxxxxxxxx> wrote:
      > For (2), the RTO timer is still operative so
      > the RTO recovery rules would still follow.

      In short:
      When with a non-RACK-TLP implementation timer (RTO) expires: cwnd=1 MSS,
      and slow start is entered.
      When with a RACK_TLP implementation timer (PTO) expires,
      normal fast recovery is entered (unless implementing
      also PRR). So no RTO recovery as explicitly stated in Sec. 7.4.1.

      This means that this document explicitly modifies standard TCP congestion
      control when there are no acks coming and the retransmission timer
      expires

      from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)

It's also worth mentioning this aspect of [RFC6298]:

Sure.

   (2.4) Whenever RTO is computed, if it is less than 1 second, then the
         RTO SHOULD be rounded up to 1 second.

             1. RTO timer expires
             2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment
             3. Ack of rexmit sent in step 2 arrives
             4. cwnd = cwnd+1 MSS; send two segments
             ...

      to:   PTO=min(2*SRTT,RTO) (PRO used for arming the timer)
             1. PTO times expires
             2. (cwnd=1 MSS); (re)xmit one segment

It may be worthwhile to point out here that the RACK-TLP draft does not specify setting cwnd
to 1 at this point, and the Linux TCP implementation from our team does not do this. The

Yes, that's why I put it in parenthesis. In my view the RACK-TLP 
draft implicitly limits cwnd to one segment by allowing just one TLP 
probe segment.

rationale is that at this point there is no solid evidence that anything has been lost, and
setting cwnd to 1 at this point would make the algorithm more timid than the preceding
approaches, for no good reason.

Sure, no need to set cwnd at this point.

A good reason could be: No feedback, Ack clock lost? But, of course, 
it is too early even though after the arrival of ack the sender may well 
modify cwnd again. Like it now does, if it decides it was loss other than 
probe segment.

             3. Ack of (re)xmit sent in srep 2 arrives
             4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments

That step (4) assumes a particular congestion control implementation that is different than
what we would recommend.

Ok. I just used the Standards Track formula as does the RACK-TLP draft in 
its examples. And because RACK-TLP draft states it does not modify 
current congestion control.

      For example, if FlightSize is 100 segments when timer expires,
      congestion control is the same in steps 1-3, but in step 4 the
      current standard congestion control allows transmitting 2 segments,
      while RACK-TLP would allow blasting 50 segments.

      Question is: what is the justification to modify standard TCP
      congestion control to use fast recovery instead of slow start for a
      case where timeout is needed to detect loss because there is no
      feedback and ack clock is lost? The draft does not give any
      justification. This clearly is in conflict with items (0) and (1)
      in BCP 133 (RFC 5033).

The draft pointedly does not modify standard TCP congestion control.

RACK-TLP does not specify using fast recovery instead of slow start for a  case where timeout
is needed to detect loss because there is no  feedback and the ACK clock is lost. Rather,
RACK-TLP only triggers fast recovery if there *is* ACK feedback providing an ACK clock and
strong evidence of a packet loss.

So here our views diverge. In the above steps I decoupled congestion 
control from what segments are sent (rexmit and xmit are mentioned there 
just as comments to check what is going on, they can be freely removed).
Congestion control governs how many segments can be sent.

In my view, when there is no feedback RACK TLP uses timeout (PTO) to help 
make progress. Without the timeout it cannot make progress. Just like 
an RFC 5681 sender, it cannot make progress until timeout expires. 
So this should be taken as the criteria to (effectively) enter slow start, 
once loss becomes detected.

Or, at least I don't see any difference why different timeout value would 
change the congestion control.

When timeout expires RACK-TLP sends one segment (just like an RFC 5681 
sender when RTO expires). The only difference is that RFC 5681 sender 
selects a different segment (first unacknowledged segment) to retransmit 
"blindly" in order to get feedback and start ACK clock. RACK-TLP sends 
"blindly" the last segment from the retransmission queue (or a new 
segment). Selecting a different segment for transmission upon timeout 
does not change anything, in my view. In both cases it is a "blind" 
selection; the sender does not know what was lost. And in both cases the 
ACK for this one segment provides feedback about what potentially has 
been lost. There the only difference is that the segment that RACK-TLP 
selected to transmit is a better choice when SACK option is use because 
it provides more information.

If there is some difference in that the ACK for RACK-TLP provides 
stronger evidence for packet loss (and what was lost), then it should be 
also ok to modify the current standard TCP congestion control such that 
upon RTO timeout the sender does not select the first unacknowledged 
segment for blind retransmission but the last segment in the 
retransmission queue (or maybe a new segment). With SACK this would 
provide exactly the same information as TLP probe does. And, upon arrival 
of the first ACK, RTO recovery would use similar rules as in RACK-TLP to 
better decide whether it was spurious RTO or loss and move from slow 
start to fast recovery and set cwnd=ssthresh.

I really don't see how this change in "blindly" retrasmitted first segment 
in slow start would allow modifying congestion control for RTO recovery.

The main aspect of triggering loss recovery that is new is the approach of allowing a sender
to transmit one additional "probe" segment in flight after 2*SRTT. Once this is accepted, the
rest of the recovery process essentially follows from principles already generally accepted
in the IETF TCP community.

Could you please see above and explain (or provide a pointer to an RFC) 
what are those "principles already generally accepted in the IETF TCP 
community". That would help me to understand your point.

Put another way, it seems to me that if one is to object to TLP-triggered fast recovery, then
the objection must be mounted specifically against the permission granted to the sender to
transmit one additional "probe" segment in flight after 2*SRTT. Once that permission is
granted, there is nothing really new about TLP-triggered fast recovery.

I am sorry but I still fail to see what is the preceding evidence that 
makes this not new. A pointer could help.

In my view the probe is not anything to object as long as it is not 
considered as a cwnd increase in the later cwnd&ssthresh calculation 
(a minor detail, but someone might later suggest first two then 4 and so 
on probe segments with the justufication that it is just one more than 
earlier).

      Furthermore, there is no implementation nor experimental experience
      evaluating this change. The implementation with experimental experience
      uses PRR (RFC 6937) which is an Experimental specification including a
      novel "trick" that directs PRR fast recovery to effectively use slow
      start in this case at hand.

What do you think of Yuchung's latest suggestion for new text in "9.3.  Interaction with
congestion control" suggested by Yuchung Thursday afternoon (Dec 17), which explicitly
recommends PRR? As mentioned earlier in this thread, there is considerable implementation and
experimental experience with RACK-TLP plus PRR since the Linux TCP stack has been using
RACK-TLP with PRR as the default loss recovery algorithm since Linux v4.18 in August 2018.

As I have already indicated, in my view PRR does not have the problem we 
are discussing here because PRR-SSRB makes fast recovery to behave like 
slow start. And PRR-CRB is even more conservative. So it would be a safe 
choice for this problem unlike the current RFC 6675 algorithm.

In other words, I only object allowing the use of RACK-TLP with the 
RFC 6675 congestion control algorithm unmodified because it does not have 
a safeguard like PRR. This does not mean that RACK-TLP document would 
need to include the necessary modifications to the RFC 6675 algorithm.

I don't know processwise but PRR possibly cannot be used as normative 
requirement because it is currently Experimental? Not quite sure though.

Best regards,

/Markku

The exact commit is:

  b38a51fec1c1 tcp: disable RFC6675 loss detection

best,
neal
-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call