Thanks Gerrit.
I think it would be great to continue to think about how path variations
lead to odd transport behaviours.
I also think we could (even should) explore new CC methods that have
different responses, but for TFRC, the behaviour needs (by definition)
to be similar to TCP.
After some thinking, I came to the conclusion that complexities can be
introduced by queuing strategies designed to support differentiated QoS,
by load-balancing, and by lower-layer mechanisms (such as mobility
hand-overs, link-layer retransmission, wireless propagation effects),
etc. These effects can change the loss rate and round trip time, and are
not necessarily be the same on the forward and return paths. However,
although this impacts TFRC, these effects would also impact performance
of any flow using TCP.
In summary, as regards the ID, I think we should say less on the
specifics, and simply indicate that a bad RTT can result in odd
behaviour for various reasons.
Gorry
On 11/11/2010 12:11, Gerrit Renker wrote:
Michael,
thank you very much indeed for your review and the helpful comments.
I am working with Gorry on a new revision to incorporate your and
Eddie's comments. Before submitting this, I would like to turn to
the one section which you have correctly identified as guesswork.
par 5: the problem described by the first two sentences: "The fifth
and last problem is starvation under burst loss..." is not clear
...
really measured, it _sounds_ like guesswork to me - this just needs
some more precise language.
To better find out if I understood something wrongly or if it is just the
description which is problematic, I would like to present the details.
The problem was observed on an 802.11g link on the ISM 2.4GHz band, which had
an average RTT of 2msec. From TCP wireshark traces it was clear that there was
interference on the channel (dupAcks and re- transmitted packets).
TCP streaming seemed to get along with occasional interference.
With UDP there were occasional 'holes' in the stream, which could successfully
be fixed by the application-layer FEC provided by the paraslash streamer
application.
With DCCP/CCID-3, however, the transmission occasionally "died", i.e. sending
only one packet per t_mbi=64 seconds. This was accompanied by RTT out-of-bounds
warnings at the receiver:
Jul 15 22:01:26 kernel: [ 2311.949466] dccp_sane_rtt: RTT sample 4766615 out of bounds!
Jul 15 22:01:39 kernel: [ 2324.335916] dccp_sane_rtt: RTT sample 12373169 out of bounds!
Jul 15 22:02:11 kernel: [ 2356.548447] dccp_sane_rtt: RTT sample 32193564 out of bounds!
Jul 15 22:03:15 kernel: [ 2420.760223] dccp_sane_rtt: RTT sample 64201733 out of bounds!
These messages occur when the RTT is greater than 3,000,000 microseconds. Once it was past
this bound, the values approximately doubled each time.
These are the facts, the rest is an interpretation, trying to figure out exactly what happened.
I have considered whether this is a bug of the implementation, but think it unlikely (all code
is open source and publicly available, it has also been checked a few times).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is what I think happened:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* first packet gets sent and arrives with normal link delay,
* second packet with a CCVal difference between 1..4 is sent, but is delayed
for more than 4 times the average RTT,
* nofeedback timer at the sender is triggered after 4*RTT, halving X / doubling t_ipi,
* there are now two different rates of change:
- X is halved immediately (step reduction),
- the RTT however passes through the low-pass filter
RTT' = 0.9 * RTT + 0.1 * sample
so that sample=10RTT means RTT'=1.9 RTT, sample=100RTT means RTT'=10.9RTT etc,
* since RTT' is used for the CCVal window counter value, the change-rate for the
CCVal window counter is also slower than the change-rate of X,
* the receiver then "sees" a larger inter-packet gap caused by the immediate change of X,
accompanied by an almost unchanged rate of change for the CCVal values,
* this has the effect of doubling the sampled RTT at the receiver,
* the receiver can not counter-act sudden changes in the RTT, since it has typically
fewer usable samples than the sender, so also the effective RTT increases.
I think that the different change rates are a general problem, and this would be what the
draft concentrates on.
But why did the RTT samples double each time:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* if the receiver sees a doubling of the RTT, it sends feedback only at half the speed,
* if this process happens just twice, the receiver sends feedback roughly every 4*RTT, which
is enough to trigger again the nofeedback timer at the sender,
* which then causes the process to start over again, until finally sending 1 packet / 64 seconds.
I have lost the traces, but believe that the problem can be reproduced with standard 2.4GHz access
points in environments where there is contention on the ISM band, and other interference
from DECT cordless phones, BlueTooth, and microwave ovens.
Gerrit