Re: [Last-Call] [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) to Proposed Standard

Martin Duke <martin.h.duke@xxxxxxxxx> · Thu, 17 Dec 2020 07:30:15 -0800

Hi Markku,

Thanks, now I understand your objections.

Martin

On Thu, Dec 17, 2020 at 12:46 AM Markku Kojo <kojo@xxxxxxxxxxxxxx> wrote:
Hi,

On Wed, 16 Dec 2020, Martin Duke wrote:

> I spent a little longer looking at the specs more carefully, and I explained (1)

> incorrectly in my last two messages. P21..29 are not Limited Transmit packets. 

Correct. Just normal the rule that allows sending new data during fast 

recovery.

> However, unless I'm missing something else, 6675 is clear that the recovery period

> does not end until the cumulative ack advances, meaning that detecting the lost

> retransmission of P1 does not trigger another MD directly.

As I have said earlier, RFC 6675 does not repeat all congestion control 

principles from RFC 5681. It definitely honors the CC principle that

requires to treat a loss of a retransmission as a new congestion 

indication and another MD. I believe I am obligated to know this as a 

co-author of RFC 6675. ;)

RFC 6675 explicitly indicates that it follows RFC 5681 by stating in the 

abstract:

" ... conforms to the spirit of the current congestion control

  specification (RFC 5681 ..."

And in the intro:

   "The algorithm specified in this document is a straightforward

    SACK-based loss recovery strategy that follows the  guidelines

    set in [RFC5681] ..."

I don't think there is anything unclear in this.

RFC 6675 and all other standard congestion controls (RFC 5581 and RFC 

6582) handle a loss of a retransmission by "enforcing" RTO to detect it. 

And RTO guarantees MD. RACK-TLP changes the loss detection in this case 

and therefore the standard congestion control algorithms do not have 

actions to handle it corrrectly. That is the point.

BR,

/Markku

> Thanks for this exercise! It's refreshed my memory of these details after working

> on slightly different QUIC algorithms a long time.

> 

> On Wed, Dec 16, 2020, 18:55 Martin Duke <martin.h.duke@xxxxxxxxx> wrote:

> (1) Flightsize: in RFC 6675. Section 5, Step 4.2:

>

>        (4.2) ssthresh = cwnd = (FlightSize / 2)

>

>              The congestion window (cwnd) and slow start threshold

>              (ssthresh) are reduced to half of FlightSize per [RFC5681].

>              Additionally, note that [RFC5681] requires that any

>              segments sent as part of the Limited Transmit mechanism not

>              be counted in FlightSize for the purpose of the above

>              equation.

> 

> IIUC the segments P21..P29 in your example were sent because of Limited

> Transmit, and so don't count. The flightsize for the purposes of (4.2) is

> therefore 20 after both losses, and the cwnd does not go up on the second

> loss.

> 

> (2)

> " Even a single shot burst every time there is significant loss

> event is not acceptable, not to mention continuous aggressiveness, and

> this is exactly what RFC 2914 and RFC 5033 explicitly address and warn

> about."

> 

> "Significant loss event" is the key phrase here. The intent of TLP/PTO is to

> equalize the treatment of a small packet loss whether it happened in the

> middle of a burst or the end. Why should an isolated loss be treated

> differently based on its position in the burst? This is just a logical

> extension of fast retransmit, which also modified the RTO paradigm. The

> working group consensus is that this is a feature, not a bug; you're welcome

> to feel otherwise but I suspect you're in the rough here.

> 

> Regards

> Martin

> 

> 

> On Wed, Dec 16, 2020 at 4:11 PM Markku Kojo <kojo@xxxxxxxxxxxxxx> wrote:

>       Hi Martin,

>

>       See inline.

>

>       On Wed, 16 Dec 2020, Martin Duke wrote:

>

>       > Hi Markku,

>       >

>       > There is a ton here, but I'll try to address the top points.

>       Hopefully

>       > they obviate the rest.

>

>       Sorry for being verbose. I tried to be clear but you actually

>       removed my

>       key issues/questions ;)

>

>       > 1.

>       > [Markku]

>       > "Hmm, not sure what you mean by "this is a new loss detection

>       after

>       > acknowledgment of new data"?

>       > But anyway, RFC 5681 gives the general principle to reduce

>       cwnd and

>       > ssthresh twice if a retransmission is lost but IMHO (and I

>       believe many

>       > who have designed new loss recovery and CC algorithms or

>       implemented

>       > them

>       > agree) that it is hard to get things right if only congestion

>       control

>       > principles are available and no algorithm."

>       >

>       > [Martin]

>       > So 6675 Sec 5 is quite explicit that there is only one cwnd

>       reduction

>       > per fast recovery episode, which ends once new data has been

>       > acknowledged.

>

>       To be more precise: fast recovery ends when the current window

>       becomes

>       cumulatively acknowledged, that is,

>

>       (4.1) RecoveryPoint (= HighData at the beginning) becomes

>       acknowledged

>

>       I believe we agree and you meant this although new data below

>       RecoveryPoint may become cumulatively acknowledged already

>       earlier

>       during the fast recovery. Reno loss recovery in RFC 5681 ends,

>       when

>       (any) new data has been acknowledged.

>

>       > By definition, if a retransmission is lost it is because

>       > newer data has been acknowledged, so it's a new recovery

>       episode.

>

>       Not sure where you have this definition? Newer than what are you

>       referring to?

>

>       But, yes, if a retransmission is lost with RFC 6675 algorithm,

>       it requires RTO to be detected and definitely starts a new

>       recovery

>       episode. That is, a new recovery episode is enforced by step

>       (1.a) of

>       NextSeg () which prevents retransmission if a segment that has

>       already

>       been retransmitted. If RACK-TLP is used for detecting loss with

>       RFC 6675

>       things get different in many ways, because it may detect loss of

>       a

>       retransmission. It would pretty much require an entire redesign

>       of the algorith. For example, calculation of pipe does not

>       consider

>       segments that have been retransmitted more than once.

>

>       > Meanwhile, during the Fast Recovery period the incoming acks

>       implicitly

>       > remove data from the network and therefore keep flightsize

>       low.

>

>       Incorrect. FlightSize != pipe. Only cumulative acks remove data

>       from

>       FlightSize and new data transmitted during fast recovery inflate

>       FlightSize. How FlightSize evolves depends on loss pattern as I

>       said.

>       It is also possible that FlightSize is low, it may err in both

>       directions. A simple example can be used as a proof for the case

>       where

>       cwnd increases if a loss of retransmission is detected and

>       repaired:

>

>       RFC 6675 recovery with RACK-TLP loss detection:

>       (contains some inaccuracies because it has not been defined how

>       lost rexmits are calculated into pipe)

>

>       cwnd=20; packets P1,...,P20 in flight = current window of data

>       [P1 dropped and rexmit of P1 will also be dropped]

>

>       DupAck w/SACK for P2 arrives

>       [loss of P1 detected after one RTT from original xmit of P1]

>       [cwnd=ssthresh=10]

>       P1 is rexmitted (and it logically starts next window of data)

>

>       DupAcks w/ SACK for original P3..11 arrive

>       DupAck w/ SACK for original P12 arrives

>       [cwnd-pipe = 10-9 >=1]

>       send P21

>       DupAck w/SACK for P13 arrives

>       send P22

>       ...

>       DupAck w/SACK for P20 arrives

>       send P29

>       [FlightSize=29]

>

>       (Ack for rexmit of P1 would arrive here unless it got dropped)

>

>       DupAck w/SACK for P21 arrives

>       [loss of rexmit P1 detected after one RTT from rexmit of P1]

>

>       SET cwnd = ssthresh = FlightSize/2= 29/2 = 14,5

>

>       CWND INCREASES when it should be at most 5 after halving it

>       twice!!!

>

>       > We can continue to go around on our interpretation of these

>       documents,

>       > but fundamentally if there is ambiguity in 5681/6675 we should

>       bis

>       > those RFCs rather than expand the scope of RACK.

>

>       As I said earlier, I am not opposing bis, though 5681bis wuold

>       not

>       be needed, I think.

>

>       But let me repeat: if we publish RACK-TLP now without necessary

>       warnings

>       or with a correct congesion control algorithm someone will try

>       to

>       implement RACK-TLP with RFC 6675 and it will be a total mesh.

>       The

>       behavior will be unpredictable and quite likely unsafe

>       congestion

>       control behavior.

>

>       > 2.

>       > [Markku]

>       > " In short:

>       > When with a non-RACK-TLP implementation timer (RTO) expires:

>       cwnd=1

>       > MSS,

>       > and slow start is entered.

>       > When with a RACK_TLP implementation timer (PTO) expires,

>       > normal fast recovery is entered (unless implementing

>       > also PRR). So no RTO recovery as explicitly stated in Sec.

>       7.4.1."

>       >

>       > [Martin]

>       > There may be a misunderstanding here. PTO is not the same as

>       RTO, and

>       > both mechanisms exist! The loss response to a PTO is to send a

>       probe;

>       > the RTO response is as with conventional TCP. In Section 7.3:

>

>       No, I don't think I misunderstood. If you call timeout with

>       another name, it is still timeout. And congestion control does

>       not

>       consider which segments to send (SND.UNA vs. probe w/ higher

>       sequence

>       number), only how much is sent.

>

>       You ignored my major point where I decoupled congestion control

>       from loss

>       detection and loss recovery and compared RFC 5681 behavior to

>       RACK-TLP

>       behavior in exactly the same scenario where an entire flight is

>       lost and

>       timer expires.

>

>       Please comment why congestion control behavior is allowed to be

>       radically

>       different in these two implementations?

>

>       RFC 5681 & RFC 6298 timeout:

>

>               RTO=SRTT+4*RTTVAR (RTO used for arming the timer)

>              1. RTO timer expires

>              2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment

>              3. Ack of rexmit sent in step 2 arrives

>              4. cwnd = cwnd+1 MSS; send two segments

>              ...

>

>       RACK-TLP timeout:

>

>               PTO=min(2*SRTT,RTO) (PTO used for arming the timer)

>              1. PTO times expires

>              2. (cwnd=1 MSS); (re)xmit one segment

>              3. Ack of (re)xmit sent in srep 2 arrives

>              4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments

>

>       If FlightSize is 100 segments when timer expires, congestion

>       control is

>       the same in steps 1-3, but in step 4 the standard congestion

>       control

>       allows transmitting 2 segments, while RACK-TLP would allow

>       blasting 50 segments.

>

>       > After attempting to send a loss probe, regardless of whether a

>       loss

>       >    probe was sent, the sender MUST re-arm the RTO timer, not

>       the PTO

>       >    timer, if FlightSize is not zero.  This ensures RTO

>       recovery remains

>       >    the last resort if TLP fails.

>       > "

>

>       This does not prevent the above RACK-TLP behavior from getting

>       realized.

>

>       > So a pure RTO response exists in the case of persistent

>       congestion that

>       > causes losses of probes or their ACKs.

>

>       Yes, RTO response exists BUT only after RACK-TLP at least once

>       blasts the

>       network. It may well be that with smaller windows RACK-TLP is

>       successful

>       during its TLP initiated overly aggressive "fast recovery" and

>       never

>       enters RTO recovery because it may detect and repair also loss

>       of

>       rexmits. That is, it continues at too high rate even if lost

>       rexmits

>       indicate that congestion persists in successive windows of data.

>       And

>       worse, it is successful because it pushes away other compatible

>       TCP

>       flows by being too aggressive and unfair.

>

>       Even a single shot burst every time there is significant loss

>       event is not acceptable, not to mention continuous

>       aggressiveness, and

>       this is exactly what RFC 2914 and RFC 5033 explicitly address

>       and warn

>       about.

>

>       Are we ignoring these BCPs that have IETF consensus?

>

>       And the other important question I'd like to have an answer:

>

>       What is the justification to modify standard TCP congestion

>       control to

>       use fast recovery instead of slow start for a case where timeout

>       is

>       needed to detect the packet losses because there is no feedback

>       and ack

>       clock is lost? RACK-TLP explicitly instructs to do so in Sec.

>       7.4.1.

>

>       As I noted: based on what is written in the draft it does not

>       intend to

>       change congestion control but effectively it does.

>

>       /Markku

>

>       > Martin

>       >

>       >

>       > On Wed, Dec 16, 2020 at 11:39 AM Markku Kojo

>       <kojo@xxxxxxxxxxxxxx>

>       > wrote:

>       >       Hi Martin,

>       >

>       >       On Tue, 15 Dec 2020, Martin Duke wrote:

>       >

>       >       > Hi Markku,

>       >       >

>       >       > Thanks for the comments. The authors will incorporate

>       >       many of your

>       >       > suggestions after the IESG review.

>       >       >

>       >       > There's one thing I don't understand in your comments:

>       >       >

>       >       > " That is,

>       >       > where can an implementer find advice for correct

>       >       congestion control

>       >       > actions with RACK-TLP, when:

>       >       >

>       >       > (1) a loss of rexmitted segment is detected

>       >       > (2) an entire flight of data gets dropped (and

>       detected),

>       >       >      that is, when there is no feedback available and

>       a

>       >       timeout

>       >       >      is needed to detect the loss "

>       >       >

>       >       > Section 9.3 is the discussion about CC, and is clear

>       that

>       >       the

>       >       > implementer should use either 5681 or 6937.

>       >

>       >       Just a cite nit: RFC 5681 provides basic CC concepts and

>       >       some useful CC

>       >       guidelines but given that RACK-TLP MUST implement SACK

>       the

>       >       algorithm in

>       >       RFC 5681 is not that useful and an implementer quite

>       likely

>       >       follows

>       >       mainly the algorithm in RFC 6675 (and not RFC 6937 at

>       all

>       >       if not

>       >       implementing PRR).

>       >       And RFC 6675 is not mentioned in Sec 9.3, though it is

>       >       listed in the

>       >       Sec. 4 (Requirements).

>       >

>       >       > You went through the 6937 case in detail.

>       >

>       >       Yes, but without correct CC actions.

>       >

>       >       > If 5681, it's pretty clear to me that in (1) this is a

>       >       new loss

>       >       > detection after acknowledgment of new data, and

>       therefore

>       >       requires a

>       >       > second halving of cwnd.

>       >

>       >       Hmm, not sure what you mean by "this is a new loss

>       >       detection after

>       >       acknowledgment of new data"?

>       >       But anyway, RFC 5681 gives the general principle to

>       reduce

>       >       cwnd and

>       >       ssthresh twice if a retransmission is lost but IMHO (and

>       I

>       >       believe many

>       >       who have designed new loss recovery and CC algorithms or

>       >       implemented them

>       >       agree) that it is hard to get things right if only

>       >       congestion control

>       >       principles are available and no algorithm.

>       >       That's why ALL mechanisms that we have include a quite

>       >       detailed algorithm

>       >       with all necessary variables and actions for loss

>       recovery

>       >       and/or CC

>       >       purposes (and often also pseudocode). Like this document

>       >       does for loss

>       >       detection.

>       >

>       >       So the problem is that we do not have a detailed enough

>       >       algorithm or

>       >       rule that tells exactly what to do when a loss of rexmit

>       is

>       >       detected.

>       >       Even worse, the algorithms in RFC 5681 and RFC 6675

>       refer

>       >       to

>       >       equation (4) of RFC 5681 to reduce ssthresh and cwnd

>       when a

>       >       loss

>       >       requiring a congestion control action is detected:

>       >

>       >         (cwnd =) ssthresh = FlightSize / 2)

>       >

>       >       And RFC 5681 gives a warning not to halve cwnd in the

>       >       equation but

>       >       FlightSize.

>       >

>       >       That is, this equation is what an implementer

>       intuitively

>       >       would use

>       >       when reading the relevant RFCs but it gives a wrong

>       result

>       >       for

>       >       outstanding data when in fast recovery (when the sender

>       is

>       >       in

>       >       congestion avoidance and the equation (4) is used to

>       halve

>       >       cwnd, it

>       >       gives a correct result).

>       >       More precisely, during fast recovery FlightSize is

>       inflated

>       >       when new

>       >       data is sent and reduced when segments are cumulatively

>       >       Acked.

>       >       What the outcome is depends on the loss pattern. In the

>       >       worst case,

>       >       FlightSize is signficantly larger than in the beginning

>       of

>       >       the fast

>       >       recovery when FlightSize was (correctly) used to

>       determine

>       >       the halved

>       >       value for cwnd and ssthresh, i.e., equation (4) may

>       result

>       >       in

>       >       *increasing* cwnd upon detecting a loss of a rexmitted

>       >       segment, instead

>       >       of further halving it.

>       >

>       >       A clever implementer might have no problem to have it

>       right

>       >       with some

>       >       thinking but I am afraid that there will be incorrect

>       >       implementations

>       >       with what is currently specified. Not all implementers

>       have

>       >       spent

>       >       signicicant fraction of their career in solving TCP

>       >       peculiarities.

>       >

>       >       > For (2), the RTO timer is still operative so

>       >       > the RTO recovery rules would still follow.

>       >

>       >       In short:

>       >       When with a non-RACK-TLP implementation timer (RTO)

>       >       expires: cwnd=1 MSS,

>       >       and slow start is entered.

>       >       When with a RACK_TLP implementation timer (PTO) expires,

>       >       normal fast recovery is entered (unless implementing

>       >       also PRR). So no RTO recovery as explicitly stated in

>       Sec.

>       >       7.4.1.

>       >

>       >       This means that this document explicitly modifies

>       standard

>       >       TCP congestion

>       >       control when there are no acks coming and the

>       >       retransmission timer

>       >       expires

>       >

>       >       from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)

>       >              1. RTO timer expires

>       >              2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one

>       >       segment

>       >              3. Ack of rexmit sent in step 2 arrives

>       >              4. cwnd = cwnd+1 MSS; send two segments

>       >              ...

>       >

>       >       to:   PTO=min(2*SRTT,RTO) (PRO used for arming the

>       timer)

>       >              1. PTO times expires

>       >              2. (cwnd=1 MSS); (re)xmit one segment

>       >              3. Ack of (re)xmit sent in srep 2 arrives

>       >              4. cwnd = ssthresh = FlightSize/2; send N=cwnd

>       >       segments

>       >

>       >       For example, if FlightSize is 100 segments when timer

>       >       expires,

>       >       congestion control is the same in steps 1-3, but in step

>       4

>       >       the

>       >       current standard congestion control allows transmitting

>       2

>       >       segments,

>       >       while RACK-TLP would allow blasting 50 segments.

>       >

>       >       Question is: what is the justification to modify

>       standard

>       >       TCP

>       >       congestion control to use fast recovery instead of slow

>       >       start for a

>       >       case where timeout is needed to detect loss because

>       there

>       >       is no

>       >       feedback and ack clock is lost? The draft does not give

>       any

>       >       justification. This clearly is in conflict with items

>       (0)

>       >       and (1)

>       >       in BCP 133 (RFC 5033).

>       >

>       >       Furthermore, there is no implementation nor experimental

>       >       experience

>       >       evaluating this change. The implementation with

>       >       experimental experience

>       >       uses PRR (RFC 6937) which is an Experimental

>       specification

>       >       including a

>       >       novel "trick" that directs PRR fast recovery to

>       effectively

>       >       use slow

>       >       start in this case at hand.

>       >

>       >

>       >       > In other words, I am not seeing a case that requires

>       new

>       >       congestion

>       >       > control concepts except as discussed in 9.3.

>       >

>       >       See above. The change in standard congestion control for

>       >       (2).

>       >       The draft intends not to change congestion control but

>       >       effectively it

>       >       does without any operational evidence.

>       >

>       >       What's also is missing and would be very useful:

>       >

>       >       - For (1), a hint for an implementer saying that because

>       >       RACK-TLP is

>       >          able to detect a loss of a rexmit unlike any other

>       loss

>       >       detection

>       >          algorithm, the sender MUST react twice to congestion

>       >       (and cite

>       >          RFC 5681). And cite a document where necessary

>       correct

>       >       actions

>       >          are described.

>       >

>       >       - For (1), advise that an implementer needs to keep

>       track

>       >       when it

>       >          detects a loss of a retransmitted segment. Current

>       >       algorithms

>       >          in the draft detect a loss of retransmitted segment

>       >       exactly in

>       >          the same way as loss of any other segment. There

>       seems

>       >       to be

>       >          nothing to track when a retransmission of a

>       >       retransmitted segment

>       >          takes place. Therefore, the algorithms should have

>       >       additional

>       >          actions to correctly track when such a loss is

>       detected.

>       >

>       >       - For (1), discussion on how many times a loss of a

>       >       retransmission

>       >          of the same segment may occur and be detected. Seems

>       >       that it

>       >          may be possible to drop a rexmitted segment more than

>       >       once and

>       >          detect it also several times?  What are the

>       >       implications?

>       >

>       >       - If previous is possible, then the algorithm possibly

>       also

>       >          may detect a loss of a new segment that was sent

>       during

>       >       fast

>       >          recovery? This is also loss in two successive windows

>       of

>       >       data,

>       >          and cwnd MUST be lowered twice. This discussion and

>       >       necessary

>       >          actions to track it are missing, if such scenario is

>       >       possible.

>       >

>       >       > What am I missing?

>       >

>       >       Hope the above helps.

>       >

>       >       /Markku

>       >

>       >

>       > <snipping the rest>

>       >

>       >

> 

> 

>
-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call