Hi Markku,
There is a ton here, but I'll try to address the top points. Hopefully they obviate the rest.
1.
[Markku]
"Hmm, not sure what you mean by "this is a new loss detection after
acknowledgment of new data"?
But anyway, RFC 5681 gives the general principle to reduce cwnd and
ssthresh twice if a retransmission is lost but IMHO (and I believe many
who have designed new loss recovery and CC algorithms or implemented them
agree) that it is hard to get things right if only congestion control
principles are available and no algorithm."
acknowledgment of new data"?
But anyway, RFC 5681 gives the general principle to reduce cwnd and
ssthresh twice if a retransmission is lost but IMHO (and I believe many
who have designed new loss recovery and CC algorithms or implemented them
agree) that it is hard to get things right if only congestion control
principles are available and no algorithm."
[Martin]
So 6675 Sec 5 is quite explicit that there is only one cwnd reduction per fast recovery episode, which ends once new data has been acknowledged. By definition, if a retransmission is lost it is because newer data has been acknowledged, so it's a new recovery episode. Meanwhile, during the Fast Recovery period the incoming acks implicitly remove data from the network and therefore keep flightsize low.
We can continue to go around on our interpretation of these documents, but fundamentally if there is ambiguity in 5681/6675 we should bis those RFCs rather than expand the scope of RACK.
2.
[Markku]
"
In short:
When with a non-RACK-TLP implementation timer (RTO) expires: cwnd=1 MSS,
and slow start is entered.
When with a RACK_TLP implementation timer (PTO) expires,
normal fast recovery is entered (unless implementing
also PRR). So no RTO recovery as explicitly stated in Sec. 7.4.1."
When with a non-RACK-TLP implementation timer (RTO) expires: cwnd=1 MSS,
and slow start is entered.
When with a RACK_TLP implementation timer (PTO) expires,
normal fast recovery is entered (unless implementing
also PRR). So no RTO recovery as explicitly stated in Sec. 7.4.1."
[Martin]
There may be a misunderstanding here. PTO is not the same as RTO, and both mechanisms exist! The loss response to a PTO is to send a probe; the RTO response is as with conventional TCP. In Section 7.3:
"
After attempting to send a loss probe, regardless of whether a loss probe was sent, the sender MUST re-arm the RTO timer, not the PTO timer, if FlightSize is not zero. This ensures RTO recovery remains the last resort if TLP fails."
So a pure RTO response exists in the case of persistent congestion that causes losses of probes or their ACKs.
Martin
On Wed, Dec 16, 2020 at 11:39 AM Markku Kojo <kojo@xxxxxxxxxxxxxx> wrote:
Hi Martin,
On Tue, 15 Dec 2020, Martin Duke wrote:
> Hi Markku,
>
> Thanks for the comments. The authors will incorporate many of your
> suggestions after the IESG review.
>
> There's one thing I don't understand in your comments:
>
> " That is,
> where can an implementer find advice for correct congestion control
> actions with RACK-TLP, when:
>
> (1) a loss of rexmitted segment is detected
> (2) an entire flight of data gets dropped (and detected),
> that is, when there is no feedback available and a timeout
> is needed to detect the loss "
>
> Section 9.3 is the discussion about CC, and is clear that the
> implementer should use either 5681 or 6937.
Just a cite nit: RFC 5681 provides basic CC concepts and some useful CC
guidelines but given that RACK-TLP MUST implement SACK the algorithm in
RFC 5681 is not that useful and an implementer quite likely follows
mainly the algorithm in RFC 6675 (and not RFC 6937 at all if not
implementing PRR).
And RFC 6675 is not mentioned in Sec 9.3, though it is listed in the
Sec. 4 (Requirements).
> You went through the 6937 case in detail.
Yes, but without correct CC actions.
> If 5681, it's pretty clear to me that in (1) this is a new loss
> detection after acknowledgment of new data, and therefore requires a
> second halving of cwnd.
Hmm, not sure what you mean by "this is a new loss detection after
acknowledgment of new data"?
But anyway, RFC 5681 gives the general principle to reduce cwnd and
ssthresh twice if a retransmission is lost but IMHO (and I believe many
who have designed new loss recovery and CC algorithms or implemented them
agree) that it is hard to get things right if only congestion control
principles are available and no algorithm.
That's why ALL mechanisms that we have include a quite detailed algorithm
with all necessary variables and actions for loss recovery and/or CC
purposes (and often also pseudocode). Like this document does for loss
detection.
So the problem is that we do not have a detailed enough algorithm or
rule that tells exactly what to do when a loss of rexmit is detected.
Even worse, the algorithms in RFC 5681 and RFC 6675 refer to
equation (4) of RFC 5681 to reduce ssthresh and cwnd when a loss
requiring a congestion control action is detected:
(cwnd =) ssthresh = FlightSize / 2)
And RFC 5681 gives a warning not to halve cwnd in the equation but
FlightSize.
That is, this equation is what an implementer intuitively would use
when reading the relevant RFCs but it gives a wrong result for
outstanding data when in fast recovery (when the sender is in
congestion avoidance and the equation (4) is used to halve cwnd, it
gives a correct result).
More precisely, during fast recovery FlightSize is inflated when new
data is sent and reduced when segments are cumulatively Acked.
What the outcome is depends on the loss pattern. In the worst case,
FlightSize is signficantly larger than in the beginning of the fast
recovery when FlightSize was (correctly) used to determine the halved
value for cwnd and ssthresh, i.e., equation (4) may result in
*increasing* cwnd upon detecting a loss of a rexmitted segment, instead
of further halving it.
A clever implementer might have no problem to have it right with some
thinking but I am afraid that there will be incorrect implementations
with what is currently specified. Not all implementers have spent
signicicant fraction of their career in solving TCP peculiarities.
> For (2), the RTO timer is still operative so
> the RTO recovery rules would still follow.
In short:
When with a non-RACK-TLP implementation timer (RTO) expires: cwnd=1 MSS,
and slow start is entered.
When with a RACK_TLP implementation timer (PTO) expires,
normal fast recovery is entered (unless implementing
also PRR). So no RTO recovery as explicitly stated in Sec. 7.4.1.
This means that this document explicitly modifies standard TCP congestion
control when there are no acks coming and the retransmission timer
expires
from: RTO=SRTT+4*RTTVAR (RTO used for arming the timer)
1. RTO timer expires
2. cwnd=1 MSS; ssthresh=FlightSize/2; rexmit one segment
3. Ack of rexmit sent in step 2 arrives
4. cwnd = cwnd+1 MSS; send two segments
...
to: PTO=min(2*SRTT,RTO) (PRO used for arming the timer)
1. PTO times expires
2. (cwnd=1 MSS); (re)xmit one segment
3. Ack of (re)xmit sent in srep 2 arrives
4. cwnd = ssthresh = FlightSize/2; send N=cwnd segments
For example, if FlightSize is 100 segments when timer expires,
congestion control is the same in steps 1-3, but in step 4 the
current standard congestion control allows transmitting 2 segments,
while RACK-TLP would allow blasting 50 segments.
Question is: what is the justification to modify standard TCP
congestion control to use fast recovery instead of slow start for a
case where timeout is needed to detect loss because there is no
feedback and ack clock is lost? The draft does not give any
justification. This clearly is in conflict with items (0) and (1)
in BCP 133 (RFC 5033).
Furthermore, there is no implementation nor experimental experience
evaluating this change. The implementation with experimental experience
uses PRR (RFC 6937) which is an Experimental specification including a
novel "trick" that directs PRR fast recovery to effectively use slow
start in this case at hand.
> In other words, I am not seeing a case that requires new congestion
> control concepts except as discussed in 9.3.
See above. The change in standard congestion control for (2).
The draft intends not to change congestion control but effectively it
does without any operational evidence.
What's also is missing and would be very useful:
- For (1), a hint for an implementer saying that because RACK-TLP is
able to detect a loss of a rexmit unlike any other loss detection
algorithm, the sender MUST react twice to congestion (and cite
RFC 5681). And cite a document where necessary correct actions
are described.
- For (1), advise that an implementer needs to keep track when it
detects a loss of a retransmitted segment. Current algorithms
in the draft detect a loss of retransmitted segment exactly in
the same way as loss of any other segment. There seems to be
nothing to track when a retransmission of a retransmitted segment
takes place. Therefore, the algorithms should have additional
actions to correctly track when such a loss is detected.
- For (1), discussion on how many times a loss of a retransmission
of the same segment may occur and be detected. Seems that it
may be possible to drop a rexmitted segment more than once and
detect it also several times? What are the implications?
- If previous is possible, then the algorithm possibly also
may detect a loss of a new segment that was sent during fast
recovery? This is also loss in two successive windows of data,
and cwnd MUST be lowered twice. This discussion and necessary
actions to track it are missing, if such scenario is possible.
> What am I missing?
Hope the above helps.
/Markku
<snipping the rest>
-- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call