Re: fifth problem in draft-ietf-dccp-tfrc-rtt-option-01

Gerrit Renker <gerrit@xxxxxxxxxxxxxx> · Tue, 21 Dec 2010 07:23:41 +0100

| 2.1: I still don't understand all of this discussion.  One concrete 
| issue with the "fifth problem":
| 
|     To the receiver this condition will look as if the inter-packet gap
|     suddenly doubled, meaning it will use samples of twice the actual
|     RTT.
| 
| I don't see why.  Say X before the loss event is 8 packets/RTT, and 
| after it is 4 packets/RTT, and RTT=1s.  Here are the window counters before:
| 
|        time   wctr
|       0.000      1
|       0.125      1
|       0.250      2
|       0.375      2
|       0.500      3
|         ...
|       1.000      5
|       1.125      5
| 
| After:
| 
|        time   wctr
|       6.000      1
|       6.250      2
|       6.500      3
| 
| So where do you get that "inter-packet gap doubling causes samples of 
| twice the actual RTT"?  You are NOT supposed to use the inter-packet gap 
| to calculate the RTT.  You are supposed to use WINDOW COUNTERS plus 
| inter-packet gaps.  And the window counters have, correctly, been 
| updated to the new sending rate: the quantity (average interpacket 
| spacing / average wctr delta), which should equal R/4, has remained the 
| same, namely the correct value of 0.25s.
|
Thank you for this counter-example.

I think that Michael, Gorry, and you are right -- there should not be room
for speculation in the draft, and causes for this behaviour should be more
carefully tested and analysed - perhaps as a separate research problem.

Below I'll try my best to recollect facts, hopefully you can help to rule out
a few more factors.

Meanwhile, regarding the draft, I think it is best to leave out speculation
and to follow Gorry's advice in his dccp@ietf posting of 20th November:

 "In summary, as regards the ID, I think we should say less on the
  specifics, and simply indicate that a bad RTT can result in odd
  behaviour for various reasons."
  http://www.ietf.org/mail-archive/web/dccp/current/msg03762.html

I still hope to find an explanation for this behaviour - in the above posting
interactions with other mechanisms were mentioned (QoS, load-balancing,
mobility, ...), that might trigger the same conditions and behaviour.

First, checked again if there is a bug in the implementation:
 * samples are accepted if the 1<= CCVal difference <= 4 or 
 * if 4 < CCVal difference and RTT_estimate/2 < sample < RTT_estimate
   (this is an optimisation in order to get more samples, if the RTT is low,
    many packets will be sent with a CCVal difference of 5)
 * implementation looks correct and has been in use for several years.

This clarified the log messages seen during the outage:
> Jul 15 22:01:26 kernel: [ 2311.949466] dccp_sane_rtt: RTT sample  4766615 out of bounds!
> Jul 15 22:01:39 kernel: [ 2324.335916] dccp_sane_rtt: RTT sample 12373169 out of bounds!
> Jul 15 22:02:11 kernel: [ 2356.548447] dccp_sane_rtt: RTT sample 32193564 out of bounds!
> Jul 15 22:03:15 kernel: [ 2420.760223] dccp_sane_rtt: RTT sample 64201733 out of bounds!

==> The message will be printed only if the CCVal difference is not 0 and the sample
    is the time interval since receiving the last packet.
    One possible cause could be reverse-path loss of feedback packets, causing halving of X.

With 5 months distance after the event and little data, the only way to know for sure is to
repeat the measurements.

Here is the setup as far as I recall:
 * access point was a D-Link System DWL-G122 802.11g USB Adapter (ralink rt73),
 * channel in the 2412 - 2467 MHz range (hostapd.conf set to channel 12)
 * client laptop used Intel 3945ABG (iwl3945, 802.11g)
 * distance between access point client was less than 3 meters
 * RTS/CTS were set to 'off'
 * the 'retry' parameter for MAC retransmissions was set to 3-7
 * the average link RTT was 2msec
 * 2-3 competing access points and roaming clients in the neighbourhood
   - not sure about microwave ovens, cellular phones (bluetooth), DECT
 * from TCP wireshark traces it is clear that there was interference on the 
   channel (dupAcks and retransmitted packets)

Gerrit