Re: [Last-Call] [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithm for TCP) to Proposed Standard

Martin Duke <martin.h.duke@xxxxxxxxx> · Wed, 16 Dec 2020 12:46:12 -0800

Hi Markku,

There is a ton here, but I'll try to address the top points. Hopefully they obviate the rest.

1.
[Markku]

"Hmm, not sure what you mean by "this is a new loss detection after 

acknowledgment of new data"?

But anyway, RFC 5681 gives the general principle to reduce cwnd and 

ssthresh twice if a retransmission is lost but IMHO (and I believe many 

who have designed new loss recovery and CC algorithms or implemented them 

agree) that it is hard to get things right if only congestion control 

principles are available and no algorithm."

[Martin]
So 6675 Sec 5 is quite explicit that there is only one cwnd reduction per fast recovery episode, which ends once new data has been acknowledged. By definition, if a retransmission is lost it is because newer data has been acknowledged, so it's a new recovery episode. Meanwhile, during the Fast Recovery period the incoming acks implicitly remove data from the network and therefore keep flightsize low.

We can continue to go around on our interpretation of these documents, but fundamentally if there is ambiguity in 5681/6675 we should bis those RFCs rather than expand the scope of RACK.

2.
[Markku]
"
In short:

When with a non-RACK-TLP implementation timer (RTO) expires: cwnd=1 MSS, 

and slow start is entered.

When with a RACK_TLP implementation timer (PTO) expires, 

normal fast recovery is entered (unless implementing 

also PRR). So no RTO recovery as explicitly stated in Sec. 7.4.1."

[Martin]
There may be a misunderstanding here. PTO is not the same as RTO, and both mechanisms exist! The loss response to a PTO is to send a probe; the RTO response is as with conventional TCP. In Section 7.3:

"

After attempting to send a loss probe, regardless of whether a loss
   probe was sent, the sender MUST re-arm the RTO timer, not the PTO
   timer, if FlightSize is not zero.  This ensures RTO recovery remains
   the last resort if TLP fails.
"

So a pure RTO response exists in the case of persistent congestion that causes losses of probes or their ACKs.

Martin

On Wed, Dec 16, 2020 at 11:39 AM Markku Kojo <kojo@xxxxxxxxxxxxxx> wrote:
Hi Martin,

On Tue, 15 Dec 2020, Martin Duke wrote:

> Hi Markku,

> 

> Thanks for the comments. The authors will incorporate many of your

> suggestions after the IESG review.

> 

> There's one thing I don't understand in your comments:

> 

> " That is,

> where can an implementer find advice for correct congestion control

> actions with RACK-TLP, when:

> 

> (1) a loss of rexmitted segment is detected

> (2) an entire flight of data gets dropped (and detected),

>      that is, when there is no feedback available and a timeout

>      is needed to detect the loss "

> 

> Section 9.3 is the discussion about CC, and is clear that the

> implementer should use either 5681 or 6937.

Just a cite nit: RFC 5681 provides basic CC concepts and some useful CC 

guidelines but given that RACK-TLP MUST implement SACK the algorithm in 

RFC 5681 is not that useful and an implementer quite likely follows 

mainly the algorithm in RFC 6675 (and not RFC 6937 at all if not 

implementing PRR).

And RFC 6675 is not mentioned in Sec 9.3, though it is listed in the 

Sec. 4 (Requirements).

> You went through the 6937 case in detail.

Yes, but without correct CC actions.

> If 5681, it's pretty clear to me that in (1) this is a new loss

> detection after acknowledgment of new data, and therefore requires a

> second halving of cwnd.

Hmm, not sure what you mean by "this is a new loss detection after 

acknowledgment of new data"?

But anyway, RFC 5681 gives the general principle to reduce cwnd and 

ssthresh twice if a retransmission is lost but IMHO (and I believe many 

who have designed new loss recovery and CC algorithms or implemented them 

agree) that it is hard to get things right if only congestion control 

principles are available and no algorithm.

That's why ALL mechanisms that we have include a quite detailed algorithm 

with all necessary variables and actions for loss recovery and/or CC 

purposes (and often also pseudocode). Like this document does for loss 

detection.

So the problem is that we do not have a detailed enough algorithm or 

rule that tells exactly what to do when a loss of rexmit is detected.

Even worse, the algorithms in RFC 5681 and RFC 6675 refer to 

equation (4) of RFC 5681 to reduce ssthresh and cwnd when a loss 

requiring a congestion control action is detected:

  (cwnd =) ssthresh = FlightSize / 2)

And RFC 5681 gives a warning not to halve cwnd in the equation but 

FlightSize.

That is, this equation is what an implementer intuitively would use 

when reading the relevant RFCs but it gives a wrong result for 

outstanding data when in fast recovery (when the sender is in 

congestion avoidance and the equation (4) is used to halve cwnd, it 

gives a correct result).

More precisely, during fast recovery FlightSize is inflated when new 

data is sent and reduced when segments are cumulatively Acked. 

What the outcome is depends on the loss pattern. In the worst case, 

FlightSize is signficantly larger than in the beginning of the fast 

recovery when FlightSize was (correctly) used to determine the halved 

value for cwnd and ssthresh, i.e., equation (4) may result in 

*increasing* cwnd upon detecting a loss of a rexmitted segment, instead 

of further halving it.

A clever implementer might have no problem to have it right with some 

thinking but I am afraid that there will be incorrect implementations 

with what is currently specified. Not all implementers have spent 

signicicant fraction of their career in solving TCP peculiarities.

> For (2), the RTO timer is still operative so

> the RTO recovery rules would still follow.

In short:

When with a non-RACK-TLP implementation timer (RTO) expires: cwnd=1 MSS, 

and slow start is entered.

When with a RACK_TLP implementation timer (PTO) expires, 

normal fast recovery is entered (unless implementing 

also PRR). So no RTO recovery as explicitly stated in Sec. 7.4.1.

This means that this document explicitly modifies standard TCP congestion 

control when there are no acks coming and the retransmission timer 

expires

from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)

       1. RTO timer expires

       2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment

       3. Ack of rexmit sent in step 2 arrives

       4. cwnd = cwnd+1 MSS; send two segments

       ...

to:   PTO=min(2*SRTT,RTO) (PRO used for arming the timer)

       1. PTO times expires

       2. (cwnd=1 MSS); (re)xmit one segment

       3. Ack of (re)xmit sent in srep 2 arrives

       4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments

For example, if FlightSize is 100 segments when timer expires,

congestion control is the same in steps 1-3, but in step 4 the 

current standard congestion control allows transmitting 2 segments, 

while RACK-TLP would allow blasting 50 segments.

Question is: what is the justification to modify standard TCP 

congestion control to use fast recovery instead of slow start for a 

case where timeout is needed to detect loss because there is no 

feedback and ack clock is lost? The draft does not give any 

justification. This clearly is in conflict with items (0) and (1) 

in BCP 133 (RFC 5033).

Furthermore, there is no implementation nor experimental experience 

evaluating this change. The implementation with experimental experience 

uses PRR (RFC 6937) which is an Experimental specification including a 

novel "trick" that directs PRR fast recovery to effectively use slow 

start in this case at hand.

> In other words, I am not seeing a case that requires new congestion

> control concepts except as discussed in 9.3.

See above. The change in standard congestion control for (2).

The draft intends not to change congestion control but effectively it 

does without any operational evidence.

What's also is missing and would be very useful:

- For (1), a hint for an implementer saying that because RACK-TLP is

   able to detect a loss of a rexmit unlike any other loss detection

   algorithm, the sender MUST react twice to congestion (and cite

   RFC 5681). And cite a document where necessary correct actions

   are described.

- For (1), advise that an implementer needs to keep track when it

   detects a loss of a retransmitted segment. Current algorithms

   in the draft detect a loss of retransmitted segment exactly in

   the same way as loss of any other segment. There seems to be

   nothing to track when a retransmission of a retransmitted segment

   takes place. Therefore, the algorithms should have additional

   actions to correctly track when such a loss is detected.

- For (1), discussion on how many times a loss of a retransmission

   of the same segment may occur and be detected. Seems that it

   may be possible to drop a rexmitted segment more than once and

   detect it also several times?  What are the implications?

- If previous is possible, then the algorithm possibly also

   may detect a loss of a new segment that was sent during fast

   recovery? This is also loss in two successive windows of data,

   and cwnd MUST be lowered twice. This discussion and necessary

   actions to track it are missing, if such scenario is possible.

> What am I missing?

Hope the above helps.

/Markku

<snipping the rest> 

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call