Re: [Last-Call] [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithm for TCP) to Proposed Standard

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Wed, Dec 16, 2020 at 2:39 PM Markku Kojo <kojo@xxxxxxxxxxxxxx> wrote:
> For (2), the RTO timer is still operative so
> the RTO recovery rules would still follow.

In short:
When with a non-RACK-TLP implementation timer (RTO) expires: cwnd=1 MSS,
and slow start is entered.
When with a RACK_TLP implementation timer (PTO) expires,
normal fast recovery is entered (unless implementing
also PRR). So no RTO recovery as explicitly stated in Sec. 7.4.1.

This means that this document explicitly modifies standard TCP congestion
control when there are no acks coming and the retransmission timer
expires

from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)

It's also worth mentioning this aspect of [RFC6298]:

   (2.4) Whenever RTO is computed, if it is less than 1 second, then the
         RTO SHOULD be rounded up to 1 second.
 
       1. RTO timer expires
       2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment
       3. Ack of rexmit sent in step 2 arrives
       4. cwnd = cwnd+1 MSS; send two segments
       ...

to:   PTO=min(2*SRTT,RTO) (PRO used for arming the timer)
       1. PTO times expires
       2. (cwnd=1 MSS); (re)xmit one segment

It may be worthwhile to point out here that the RACK-TLP draft does not specify setting cwnd to 1 at this point, and the Linux TCP implementation from our team does not do this. The rationale is that at this point there is no solid evidence that anything has been lost, and setting cwnd to 1 at this point would make the algorithm more timid than the preceding approaches, for no good reason.
 
       3. Ack of (re)xmit sent in srep 2 arrives
       4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments

That step (4) assumes a particular congestion control implementation that is different than what we would recommend.


For example, if FlightSize is 100 segments when timer expires,
congestion control is the same in steps 1-3, but in step 4 the
current standard congestion control allows transmitting 2 segments,
while RACK-TLP would allow blasting 50 segments.

Question is: what is the justification to modify standard TCP
congestion control to use fast recovery instead of slow start for a
case where timeout is needed to detect loss because there is no
feedback and ack clock is lost? The draft does not give any
justification. This clearly is in conflict with items (0) and (1)
in BCP 133 (RFC 5033).

The draft pointedly does not modify standard TCP congestion control.

RACK-TLP does not specify using fast recovery instead of slow start for a  case where timeout is needed to detect loss because there is no  feedback and the ACK clock is lost. Rather, RACK-TLP only triggers fast recovery if there *is* ACK feedback providing an ACK clock and strong evidence of a packet loss.

The main aspect of triggering loss recovery that is new is the approach of allowing a sender to transmit one additional "probe" segment in flight after 2*SRTT. Once this is accepted, the rest of the recovery process essentially follows from principles already generally accepted in the IETF TCP community.

Put another way, it seems to me that if one is to object to TLP-triggered fast recovery, then the objection must be mounted specifically against the permission granted to the sender to transmit one additional "probe" segment in flight after 2*SRTT. Once that permission is granted, there is nothing really new about TLP-triggered fast recovery.

Furthermore, there is no implementation nor experimental experience
evaluating this change. The implementation with experimental experience
uses PRR (RFC 6937) which is an Experimental specification including a
novel "trick" that directs PRR fast recovery to effectively use slow
start in this case at hand.

What do you think of Yuchung's latest suggestion for new text in "9.3.  Interaction with congestion control" suggested by Yuchung Thursday afternoon (Dec 17), which explicitly recommends PRR? As mentioned earlier in this thread, there is considerable implementation and experimental experience with RACK-TLP plus PRR since the Linux TCP stack has been using RACK-TLP with PRR as the default loss recovery algorithm since Linux v4.18 in August 2018. The exact commit is:

  b38a51fec1c1 tcp: disable RFC6675 loss detection

best,
neal

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux