I have identified a bug in CCID 3's loss interval code affecting the 2nd loss interval. Essentially, the bug is that CCID 3 doesn't create the 2nd loss interval until the loss that starts the 3rd interval. Therefore, the 2nd loss interval can't be used to adjust the loss interval rate sent back to the sender until another loss occurs. I have observed this issue resulting in connections that maintain a very low sending rate for >30 seconds because the computed loss event rate from the first interval is 1 in 33 or so. When the 2nd loss finally occurs, the computed loss event rate suddenly jumps to 1 in 2000 or so and the allowed sending rate increases dramatically. See [1] for a time sequence graph of one of these connections. RFC4342 (which is consistent with RFC3448 and 5348) defines a loss interval as at most one round-trip time's worth of packets that may be lost followed by an arbitrarily long series of non-dropped packets (6.1). This, of course, doesn't work for the first interval so 3448 and 5348 require the sender to compute an artificial loss interval length for this first interval based on the receive rate (6.3.1). To compute the loss event rate the receiver is supposed to calculate the average loss interval length over the last 8 intervals. The current loss interval (i.e. the currently growing one) is only included if it increases the average loss interval length (5.4). The way this is implemented currently, the first loss interval structure is created in tfrc_lh_interval_add() (loss_interval.c) when the first packet loss occurs. It's length is artificially computed as specified in the RFCs. The bug is that a second loss interval structure is not created at this point for the new loss interval that starts with this first loss (I interpret 6.3.1 to imply that the first loss ends the first loss interval). With every packet received, the receiver calls tfrc_update_i_mean() to update the loss event rate by updating the current loss event length. However, if there is only one loss event structure this function returns before recomputing the loss event rate. That behavior is correct, because we don't want to tamper with the artificially computed first loss event length. If we had that second loss interval structure, we could update its length and include it in the loss event rate if/when it became big enough to increase the average loss interval length and thus decrease the loss event rate. When the second loss finally occurs, tfrc_lh_interval_add() is called again. This time the function overwrites the length of the first loss interval structure with the number of packets since the first loss, in effect deleting the first loss interval and replacing it by the second. The function then creates a new loss interval structure for the loss interval starting with the second loss. (This actually works correctly on all later loss intervals because then the "current" loss interval structure is actually the current loss interval, so this code just sets its final length and adds a new loss interval structure for the new loss interval) This new loss interval structure will be updated by tfrc_update_i_mean() and can affect the average loss interval length if it becomes long enough. From this point on the code will function as the RFCs expect. My fix (patch below) is simply to add a second loss event structure to represent the newly started loss event at the time the first loss is detected. [1] http://www.sjero.net/software/dccp/images/dccp_ccid3_2nd_loss_interval_bug.png -- Samuel Jero Masters Student Computer Science Internetworking Research Group Ohio University --- Signed-off-by: Samuel Jero <sj323707@xxxxxxxx> diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index 47045e1..ad24819 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -198,9 +198,21 @@ bool tfrc_lh_interval_add(struct tfrc_loss_hist *lh, struct tfrc_rx_hist *rh, cur->li_ccval = cong_evt->tfrchrx_ccval; cur->li_is_closed = false; - if (++lh->counter == 1) + if (++lh->counter == 1) { lh->i_mean = cur->li_length = (*calc_first_li)(sk); - else { + cur->li_is_closed = true; + cur = tfrc_lh_demand_next(lh); + if (unlikely(cur == NULL)) { + DCCP_CRIT("Cannot allocate/add loss record."); + return false; + } + ++lh->counter; + cur->li_seqno = cong_evt_seqno; + cur->li_ccval = cong_evt->tfrchrx_ccval; + cur->li_is_closed = false; + cur->li_length = dccp_delta_seqno(cur->li_seqno, + tfrc_rx_hist_last_rcv(rh)->tfrchrx_seqno) + 1; + } else { /* RFC 5348, 5.3: length of the open loss interval I_0 */ cur->li_length = dccp_delta_seqno(cur->li_seqno, tfrc_rx_hist_last_rcv(rh)->tfrchrx_seqno) + 1;
Attachment:
signature.asc
Description: OpenPGP digital signature