RACK-TLP intentionally decouples loss detection ...
As mentioned in Figure 1 caption, RFC5681 mandates a principle that
Loss in two successive windows of data, or the loss of a
retransmission, should be taken as two indications of congestion, and
therefore reacted separately. However implementation of RFC6675 pipe
algorithm may not directly account for this newly detected congestion
events properly. PRR [RFCxxxx] is RECOMMENDED for the specific
I agree that we should have a note in this RFC about congestion control action upon detecting lost retransmission(s).
From: tcpm <tcpm-bounces@xxxxxxxx> On Behalf Of Martin Duke
Sent: Thursday, December 17, 2020 7:30 AM
To: Markku Kojo <kojo@xxxxxxxxxxxxxx>
Cc: tcpm@xxxxxxxx Extensions <tcpm@xxxxxxxx>; draft-ietf-tcpm-rack@xxxxxxxx; Michael Tuexen <tuexen@xxxxxxxxxxxxxx>; draft-ietf-tcpm-rack.all@xxxxxxxx; Last Call <last-call@xxxxxxxx>; tcpm-chairs <tcpm-chairs@xxxxxxxx>
Subject: [EXTERNAL] Re: [tcpm] Last Call: <draft-ietf-tcpm-rack-13.txt>(TheRACK-TLPlossdetectionalgorithmforTCP) to Proposed Standard
Hi Markku,
Thanks, now I understand your objections.
Martin
On Thu, Dec 17, 2020 at 12:46 AM Markku Kojo <kojo@xxxxxxxxxxxxxx> wrote:
Hi,
On Wed, 16 Dec 2020, Martin Duke wrote:
> I spent a little longer looking at the specs more carefully, and I explained (1)
> incorrectly in my last two messages. P21..29 are not Limited Transmit packets.
Correct. Just normal the rule that allows sending new data during fast
recovery.
> However, unless I'm missing something else, 6675 is clear that the recovery period
> does not end until the cumulative ack advances, meaning that detecting the lost
> retransmission of P1 does not trigger another MD directly.
As I have said earlier, RFC 6675 does not repeat all congestion control
principles from RFC 5681. It definitely honors the CC principle that
requires to treat a loss of a retransmission as a new congestion
indication and another MD. I believe I am obligated to know this as a
co-author of RFC 6675. ;)
RFC 6675 explicitly indicates that it follows RFC 5681 by stating in the
abstract:
" ... conforms to the spirit of the current congestion control
specification (RFC 5681 ..."
And in the intro:
"The algorithm specified in this document is a straightforward
SACK-based loss recovery strategy that follows the guidelines
set in [RFC5681] ..."
I don't think there is anything unclear in this.
RFC 6675 and all other standard congestion controls (RFC 5581 and RFC
6582) handle a loss of a retransmission by "enforcing" RTO to detect it.
And RTO guarantees MD. RACK-TLP changes the loss detection in this case
and therefore the standard congestion control algorithms do not have
actions to handle it corrrectly. That is the point.
BR,
/Markku
> Thanks for this exercise! It's refreshed my memory of these details after working
> on slightly different QUIC algorithms a long time.
>
> On Wed, Dec 16, 2020, 18:55 Martin Duke <martin.h.duke@xxxxxxxxx> wrote:
> (1) Flightsize: in RFC 6675. Section 5, Step 4.2:
>
> (4.2) ssthresh = cwnd = (FlightSize / 2)
>
> The congestion window (cwnd) and slow start threshold
> (ssthresh) are reduced to half of FlightSize per [RFC5681].
> Additionally, note that [RFC5681] requires that any
> segments sent as part of the Limited Transmit mechanism not
> be counted in FlightSize for the purpose of the above
> equation.
>
> IIUC the segments P21..P29 in your example were sent because of Limited
> Transmit, and so don't count. The flightsize for the purposes of (4.2) is
> therefore 20 after both losses, and the cwnd does not go up on the second
> loss.
>
> (2)
> " Even a single shot burst every time there is significant loss
> event is not acceptable, not to mention continuous aggressiveness, and
> this is exactly what RFC 2914 and RFC 5033 explicitly address and warn
> about."
>
> "Significant loss event" is the key phrase here. The intent of TLP/PTO is to
> equalize the treatment of a small packet loss whether it happened in the
> middle of a burst or the end. Why should an isolated loss be treated
> differently based on its position in the burst? This is just a logical
> extension of fast retransmit, which also modified the RTO paradigm. The
> working group consensus is that this is a feature, not a bug; you're welcome
> to feel otherwise but I suspect you're in the rough here.
>
> Regards
> Martin
>
>
> On Wed, Dec 16, 2020 at 4:11 PM Markku Kojo <kojo@xxxxxxxxxxxxxx> wrote:
> Hi Martin,
>
> See inline.
>
> On Wed, 16 Dec 2020, Martin Duke wrote:
>
> > Hi Markku,
> >
> > There is a ton here, but I'll try to address the top points.
> Hopefully
> > they obviate the rest.
>
> Sorry for being verbose. I tried to be clear but you actually
> removed my
> key issues/questions ;)
>
> > 1.
> > [Markku]
> > "Hmm, not sure what you mean by "this is a new loss detection
> after
> > acknowledgment of new data"?
> > But anyway, RFC 5681 gives the general principle to reduce
> cwnd and
> > ssthresh twice if a retransmission is lost but IMHO (and I
> believe many
> > who have designed new loss recovery and CC algorithms or
> implemented
> > them
> > agree) that it is hard to get things right if only congestion
> control
> > principles are available and no algorithm."
> >
> > [Martin]
> > So 6675 Sec 5 is quite explicit that there is only one cwnd
> reduction
> > per fast recovery episode, which ends once new data has been
> > acknowledged.
>
> To be more precise: fast recovery ends when the current window
> becomes
> cumulatively acknowledged, that is,
>
> (4.1) RecoveryPoint (= HighData at the beginning) becomes
> acknowledged
>
> I believe we agree and you meant this although new data below
> RecoveryPoint may become cumulatively acknowledged already
> earlier
> during the fast recovery. Reno loss recovery in RFC 5681 ends,
> when
> (any) new data has been acknowledged.
>
> > By definition, if a retransmission is lost it is because
> > newer data has been acknowledged, so it's a new recovery
> episode.
>
> Not sure where you have this definition? Newer than what are you
> referring to?
>
> But, yes, if a retransmission is lost with RFC 6675 algorithm,
> it requires RTO to be detected and definitely starts a new
> recovery
> episode. That is, a new recovery episode is enforced by step
> (1.a) of
> NextSeg () which prevents retransmission if a segment that has
> already
> been retransmitted. If RACK-TLP is used for detecting loss with
> RFC 6675
> things get different in many ways, because it may detect loss of
> a
> retransmission. It would pretty much require an entire redesign
> of the algorith. For example, calculation of pipe does not
> consider
> segments that have been retransmitted more than once.
>
> > Meanwhile, during the Fast Recovery period the incoming acks
> implicitly
> > remove data from the network and therefore keep flightsize
> low.
>
> Incorrect. FlightSize != pipe. Only cumulative acks remove data
> from
> FlightSize and new data transmitted during fast recovery inflate
> FlightSize. How FlightSize evolves depends on loss pattern as I
> said.
> It is also possible that FlightSize is low, it may err in both
> directions. A simple example can be used as a proof for the case
> where
> cwnd increases if a loss of retransmission is detected and
> repaired:
>
> RFC 6675 recovery with RACK-TLP loss detection:
> (contains some inaccuracies because it has not been defined how
> lost rexmits are calculated into pipe)
>
> cwnd=20; packets P1,...,P20 in flight = current window of data
> [P1 dropped and rexmit of P1 will also be dropped]
>
> DupAck w/SACK for P2 arrives
> [loss of P1 detected after one RTT from original xmit of P1]
> [cwnd=ssthresh=10]
> P1 is rexmitted (and it logically starts next window of data)
>
> DupAcks w/ SACK for original P3..11 arrive
> DupAck w/ SACK for original P12 arrives
> [cwnd-pipe = 10-9 >=1]
> send P21
> DupAck w/SACK for P13 arrives
> send P22
> ...
> DupAck w/SACK for P20 arrives
> send P29
> [FlightSize=29]
>
> (Ack for rexmit of P1 would arrive here unless it got dropped)
>
> DupAck w/SACK for P21 arrives
> [loss of rexmit P1 detected after one RTT from rexmit of P1]
>
> SET cwnd = ssthresh = FlightSize/2= 29/2 = 14,5
>
> CWND INCREASES when it should be at most 5 after halving it
> twice!!!
>
> > We can continue to go around on our interpretation of these
> documents,
> > but fundamentally if there is ambiguity in 5681/6675 we should
> bis
> > those RFCs rather than expand the scope of RACK.
>
> As I said earlier, I am not opposing bis, though 5681bis wuold
> not
> be needed, I think.
>
> But let me repeat: if we publish RACK-TLP now without necessary
> warnings
> or with a correct congesion control algorithm someone will try
> to
> implement RACK-TLP with RFC 6675 and it will be a total mesh.
> The
> behavior will be unpredictable and quite likely unsafe
> congestion
> control behavior.
>
> > 2.
> > [Markku]
> > " In short:
> > When with a non-RACK-TLP implementation timer (RTO) expires:
> cwnd=1
> > MSS,
> > and slow start is entered.
> > When with a RACK_TLP implementation timer (PTO) expires,
> > normal fast recovery is entered (unless implementing
> > also PRR). So no RTO recovery as explicitly stated in Sec.
> 7.4.1."
> >
> > [Martin]
> > There may be a misunderstanding here. PTO is not the same as
> RTO, and
> > both mechanisms exist! The loss response to a PTO is to send a
> probe;
> > the RTO response is as with conventional TCP. In Section 7.3:
>
> No, I don't think I misunderstood. If you call timeout with
> another name, it is still timeout. And congestion control does
> not
> consider which segments to send (SND.UNA vs. probe w/ higher
> sequence
> number), only how much is sent.
>
> You ignored my major point where I decoupled congestion control
> from loss
> detection and loss recovery and compared RFC 5681 behavior to
> RACK-TLP
> behavior in exactly the same scenario where an entire flight is
> lost and
> timer expires.
>
> Please comment why congestion control behavior is allowed to be
> radically
> different in these two implementations?
>
> RFC 5681 & RFC 6298 timeout:
>
> RTO=SRTT+4*RTTVAR (RTO used for arming the timer)
> 1. RTO timer expires
> 2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment
> 3. Ack of rexmit sent in step 2 arrives
> 4. cwnd = cwnd+1 MSS; send two segments
> ...
>
> RACK-TLP timeout:
>
> PTO=min(2*SRTT,RTO) (PTO used for arming the timer)
> 1. PTO times expires
> 2. (cwnd=1 MSS); (re)xmit one segment
> 3. Ack of (re)xmit sent in srep 2 arrives
> 4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments
>
> If FlightSize is 100 segments when timer expires, congestion
> control is
> the same in steps 1-3, but in step 4 the standard congestion
> control
> allows transmitting 2 segments, while RACK-TLP would allow
> blasting 50 segments.
>
> > After attempting to send a loss probe, regardless of whether a
> loss
> > probe was sent, the sender MUST re-arm the RTO timer, not
> the PTO
> > timer, if FlightSize is not zero. This ensures RTO
> recovery remains
> > the last resort if TLP fails.
> > "
>
> This does not prevent the above RACK-TLP behavior from getting
> realized.
>
> > So a pure RTO response exists in the case of persistent
> congestion that
> > causes losses of probes or their ACKs.
>
> Yes, RTO response exists BUT only after RACK-TLP at least once
> blasts the
> network. It may well be that with smaller windows RACK-TLP is
> successful
> during its TLP initiated overly aggressive "fast recovery" and
> never
> enters RTO recovery because it may detect and repair also loss
> of
> rexmits. That is, it continues at too high rate even if lost
> rexmits
> indicate that congestion persists in successive windows of data.
> And
> worse, it is successful because it pushes away other compatible
> TCP
> flows by being too aggressive and unfair.
>
> Even a single shot burst every time there is significant loss
> event is not acceptable, not to mention continuous
> aggressiveness, and
> this is exactly what RFC 2914 and RFC 5033 explicitly address
> and warn
> about.
>
> Are we ignoring these BCPs that have IETF consensus?
>
> And the other important question I'd like to have an answer:
>
> What is the justification to modify standard TCP congestion
> control to
> use fast recovery instead of slow start for a case where timeout
> is
> needed to detect the packet losses because there is no feedback
> and ack
> clock is lost? RACK-TLP explicitly instructs to do so in Sec.
> 7.4.1.
>
> As I noted: based on what is written in the draft it does not
> intend to
> change congestion control but effectively it does.
>
> /Markku
>
> > Martin
> >
> >
> > On Wed, Dec 16, 2020 at 11:39 AM Markku Kojo
> <kojo@xxxxxxxxxxxxxx>
> > wrote:
> > Hi Martin,
> >
> > On Tue, 15 Dec 2020, Martin Duke wrote:
> >
> > > Hi Markku,
> > >
> > > Thanks for the comments. The authors will incorporate
> > many of your
> > > suggestions after the IESG review.
> > >
> > > There's one thing I don't understand in your comments:
> > >
> > > " That is,
> > > where can an implementer find advice for correct
> > congestion control
> > > actions with RACK-TLP, when:
> > >
> > > (1) a loss of rexmitted segment is detected
> > > (2) an entire flight of data gets dropped (and
> detected),
> > > that is, when there is no feedback available and
> a
> > timeout
> > > is needed to detect the loss "
> > >
> > > Section 9.3 is the discussion about CC, and is clear
> that
> > the
> > > implementer should use either 5681 or 6937.
> >
> > Just a cite nit: RFC 5681 provides basic CC concepts and
> > some useful CC
> > guidelines but given that RACK-TLP MUST implement SACK
> the
> > algorithm in
> > RFC 5681 is not that useful and an implementer quite
> likely
> > follows
> > mainly the algorithm in RFC 6675 (and not RFC 6937 at
> all
> > if not
> > implementing PRR).
> > And RFC 6675 is not mentioned in Sec 9.3, though it is
> > listed in the
> > Sec. 4 (Requirements).
> >
> > > You went through the 6937 case in detail.
> >
> > Yes, but without correct CC actions.
> >
> > > If 5681, it's pretty clear to me that in (1) this is a
> > new loss
> > > detection after acknowledgment of new data, and
> therefore
> > requires a
> > > second halving of cwnd.
> >
> > Hmm, not sure what you mean by "this is a new loss
> > detection after
> > acknowledgment of new data"?
> > But anyway, RFC 5681 gives the general principle to
> reduce
> > cwnd and
> > ssthresh twice if a retransmission is lost but IMHO (and
> I
> > believe many
> > who have designed new loss recovery and CC algorithms or
> > implemented them
> > agree) that it is hard to get things right if only
> > congestion control
> > principles are available and no algorithm.
> > That's why ALL mechanisms that we have include a quite
> > detailed algorithm
> > with all necessary variables and actions for loss
> recovery
> > and/or CC
> > purposes (and often also pseudocode). Like this document
> > does for loss
> > detection.
> >
> > So the problem is that we do not have a detailed enough
> > algorithm or
> > rule that tells exactly what to do when a loss of rexmit
> is
> > detected.
> > Even worse, the algorithms in RFC 5681 and RFC 6675
> refer
> > to
> > equation (4) of RFC 5681 to reduce ssthresh and cwnd
> when a
> > loss
> > requiring a congestion control action is detected:
> >
> > (cwnd =) ssthresh = FlightSize / 2)
> >
> > And RFC 5681 gives a warning not to halve cwnd in the
> > equation but
> > FlightSize.
> >
> > That is, this equation is what an implementer
> intuitively
> > would use
> > when reading the relevant RFCs but it gives a wrong
> result
> > for
> > outstanding data when in fast recovery (when the sender
> is
> > in
> > congestion avoidance and the equation (4) is used to
> halve
> > cwnd, it
> > gives a correct result).
> > More precisely, during fast recovery FlightSize is
> inflated
> > when new
> > data is sent and reduced when segments are cumulatively
> > Acked.
> > What the outcome is depends on the loss pattern. In the
> > worst case,
> > FlightSize is signficantly larger than in the beginning
> of
> > the fast
> > recovery when FlightSize was (correctly) used to
> determine
> > the halved
> > value for cwnd and ssthresh, i.e., equation (4) may
> result
> > in
> > *increasing* cwnd upon detecting a loss of a rexmitted
> > segment, instead
> > of further halving it.
> >
> > A clever implementer might have no problem to have it
> right
> > with some
> > thinking but I am afraid that there will be incorrect
> > implementations
> > with what is currently specified. Not all implementers
> have
> > spent
> > signicicant fraction of their career in solving TCP
> > peculiarities.
> >
> > > For (2), the RTO timer is still operative so
> > > the RTO recovery rules would still follow.
> >
> > In short:
> > When with a non-RACK-TLP implementation timer (RTO)
> > expires: cwnd=1 MSS,
> > and slow start is entered.
> > When with a RACK_TLP implementation timer (PTO) expires,
> > normal fast recovery is entered (unless implementing
> > also PRR). So no RTO recovery as explicitly stated in
> Sec.
> > 7.4.1.
> >
> > This means that this document explicitly modifies
> standard
> > TCP congestion
> > control when there are no acks coming and the
> > retransmission timer
> > expires
> >
> > from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)
> > 1. RTO timer expires
> > 2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one
> > segment
> > 3. Ack of rexmit sent in step 2 arrives
> > 4. cwnd = cwnd+1 MSS; send two segments
> > ...
> >
> > to: PTO=min(2*SRTT,RTO) (PRO used for arming the
> timer)
> > 1. PTO times expires
> > 2. (cwnd=1 MSS); (re)xmit one segment
> > 3. Ack of (re)xmit sent in srep 2 arrives
> > 4. cwnd = ssthresh = FlightSize/2; send N=cwnd
> > segments
> >
> > For example, if FlightSize is 100 segments when timer
> > expires,
> > congestion control is the same in steps 1-3, but in step
> 4
> > the
> > current standard congestion control allows transmitting
> 2
> > segments,
> > while RACK-TLP would allow blasting 50 segments.
> >
> > Question is: what is the justification to modify
> standard
> > TCP
> > congestion control to use fast recovery instead of slow
> > start for a
> > case where timeout is needed to detect loss because
> there
> > is no
> > feedback and ack clock is lost? The draft does not give
> any
> > justification. This clearly is in conflict with items
> (0)
> > and (1)
> > in BCP 133 (RFC 5033).
> >
> > Furthermore, there is no implementation nor experimental
> > experience
> > evaluating this change. The implementation with
> > experimental experience
> > uses PRR (RFC 6937) which is an Experimental
> specification
> > including a
> > novel "trick" that directs PRR fast recovery to
> effectively
> > use slow
> > start in this case at hand.
> >
> >
> > > In other words, I am not seeing a case that requires
> new
> > congestion
> > > control concepts except as discussed in 9.3.
> >
> > See above. The change in standard congestion control for
> > (2).
> > The draft intends not to change congestion control but
> > effectively it
> > does without any operational evidence.
> >
> > What's also is missing and would be very useful:
> >
> > - For (1), a hint for an implementer saying that because
> > RACK-TLP is
> > able to detect a loss of a rexmit unlike any other
> loss
> > detection
> > algorithm, the sender MUST react twice to congestion
> > (and cite
> > RFC 5681). And cite a document where necessary
> correct
> > actions
> > are described.
> >
> > - For (1), advise that an implementer needs to keep
> track
> > when it
> > detects a loss of a retransmitted segment. Current
> > algorithms
> > in the draft detect a loss of retransmitted segment
> > exactly in
> > the same way as loss of any other segment. There
> seems
> > to be
> > nothing to track when a retransmission of a
> > retransmitted segment
> > takes place. Therefore, the algorithms should have
> > additional
> > actions to correctly track when such a loss is
> detected.
> >
> > - For (1), discussion on how many times a loss of a
> > retransmission
> > of the same segment may occur and be detected. Seems
> > that it
> > may be possible to drop a rexmitted segment more than
> > once and
> > detect it also several times? What are the
> > implications?
> >
> > - If previous is possible, then the algorithm possibly
> also
> > may detect a loss of a new segment that was sent
> during
> > fast
> > recovery? This is also loss in two successive windows
> of
> > data,
> > and cwnd MUST be lowered twice. This discussion and
> > necessary
> > actions to track it are missing, if such scenario is
> > possible.
> >
> > > What am I missing?
> >
> > Hope the above helps.
> >
> > /Markku
> >
> >
> > <snipping the rest>
> >
> >
>
>
>
-- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call