Re: [PATCH 1/1] DCCP: Another attempt at fixing up t_nom

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The differences between this patch and 3e are diminishing. Both always update t_nom in 
ccid3_hc_tx_send_packet, so there would be no great leap moving	timeval_add_usecs
from ccid3_update_send_time to the end of ccid3_hc_tx_send_packet. (A further advantage
is that then t_ipi / delta is only recomputed when necessary, not for each packet sent.)

So the main difference is in the code regarding the initial t_ipi of 1 second. 


  
  +-----------*---------------------|--------------------------------
  |           ^                     |
  t_0         first feedback        t_0 + 1 second
              arrives
  first                             application wakes up
  packet                            second packet sent
  sent

The changes affect all packets from the second one on in the send queue. Since the t_ipi will
typically be much less than 1 second, the effect of subtracting 1 second and adding the current 
t_ipi to t_nom = t_0 has the effect that up to the entire tx queue is sent immediately. This
is because the subtraction creates a negative credit.

A numerical example: let's assume RTT = 5 ms, t_ipi = 100 usec, tx_qlen = 100. First packet has been
sent, so there are 99 packets in the queue when the first feedback packet arrives at t_0 + 5 msec.
Recomputing t_nom gives t_0 plus 100 usec, so we are already late for that. Subsequent call to 
sk_write_space in ccid3_hc_tx_packet_recv wakes up the application, which then calls ccid3_hc_tx_send_packet
in state FBACK: delay is negative, packet is sent immediately (as intended), then t_ipi=100usec are
added to t_nom. The result is that t_nom is now t_0 + 2* 100 usec, which still is -4.80 milliseconds
from now. As a result, in this example up to 50 packets are sent in a burst now, spaced by 100 usecs.

Without the `hack', the first packet is sent, when feedback arrives after 5msec, t_ipi is updated, but
not t_nom. Waking up the application via sk_write_space results in going back to sleep until, t_0 + 1 second,
after which the packets are sent with the correct spacing of 100 usec.

So this would slightly speed up things for the first batch of packets. The downside is that we are
opening the back door for another race condition which occurs when tx_send_packet tries to push out a 
packet shortly after t_ipi has been subtracted from t_nom; this would mess up the value of t_nom since
then t_ipi is added twice (both in update_send_time and in tx_send_packet via another call to update_send_time).

So my conclusion is, if you absolutely insist that re-subtracting t_ipi as above is better, we should merge your
patch and 3d, and mark clearly, via an `XXX' or `FIXME' that this is a temporary solution (and may be liable to 
a race condition).

Another alternative which we could do is to use a configuration option which sets a fixed value of an initial
RTT. From this value we could compute the initial speed (as per Eddie's email and in the same manner we would do
it if we had the initial RTT information), and uses this instead of the - agreed, it is rather larger - initial
t_ipi of 1 second. As with the other option, one could use the value `0' to disable the optional setting of a 
fixed initial RTT. 

I see in the initial value of the RTT the main difference between 3e and your patch and wait for your comments
on the above.

Gerrit

Quoting Ian McDonald:
|  Calculate t_nom as per RFC3448 section 4.6. We have been making it too
|  complicated previously.
|  
|  We do have a bit of a hack for initial ipi as we are not picking up the rtt
|  from initial setup. This should be tidied up in future.
|  
|  Signed-off-by: Ian McDonald <ian.mcdonald@xxxxxxxxxxx>
|  ---
|  diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
|  index 1ebb4da..57f9fb8 100644
|  --- a/net/dccp/ccids/ccid3.c
|  +++ b/net/dccp/ccids/ccid3.c
|  @@ -82,12 +82,10 @@ static void ccid3_hc_tx_set_state(struct sock *sk,
|   
|   /*
|    * Recalculate scheduled nominal send time t_nom, inter-packet interval
|  - * t_ipi, and delta value. Should be called after each change to X.
|  + * t_ipi, and delta value. Should be called after each packet sent.
|    */
|  -static inline void ccid3_update_send_time(struct ccid3_hc_tx_sock *hctx)
|  +static void ccid3_update_send_time(struct ccid3_hc_tx_sock *hctx)
|   {
|  -	timeval_sub_usecs(&hctx->ccid3hctx_t_nom, hctx->ccid3hctx_t_ipi);
|  -
|   	/* Calculate new t_ipi = s / X_inst (X_inst is in 64 * bytes/second) */
|   	hctx->ccid3hctx_t_ipi = scaled_div(hctx->ccid3hctx_s,
|   					   hctx->ccid3hctx_x >> 6);
|  @@ -118,8 +116,6 @@ static inline void ccid3_update_send_time(struct ccid3_hc_tx_sock *hctx)
|    *       fine-grained resolution of sending rates. This requires scaling by 2^6
|    *       throughout the code. Only X_calc is unscaled (in bytes/second).
|    *
|  - * If X has changed, we also update the scheduled send time t_now,
|  - * the inter-packet interval t_ipi, and the delta value.
|    */
|   static void ccid3_hc_tx_update_x(struct sock *sk, struct timeval *now)
|   
|  @@ -145,13 +141,10 @@ static void ccid3_hc_tx_update_x(struct sock *sk, struct timeval *now)
|   		hctx->ccid3hctx_t_ld = *now;
|   	}
|   
|  -	if (hctx->ccid3hctx_x != old_x) {
|  +	if (hctx->ccid3hctx_x != old_x)
|   		ccid3_pr_debug("X_prev=%llu, X_now=%llu, X_calc=%u, "
|   			       "X_recv=%llu\n", old_x >> 6, hctx->ccid3hctx_x >> 6,
|   			       hctx->ccid3hctx_x_calc, hctx->ccid3hctx_x_recv >> 6);
|  -
|  -		ccid3_update_send_time(hctx);
|  -	}
|   }
|   
|   /*
|  @@ -344,6 +337,8 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
|   
|   		/* Set t_0 for initial packet */
|   		hctx->ccid3hctx_t_nom = now;
|  +		timeval_add_usecs(&hctx->ccid3hctx_t_nom,
|  +		   hctx->ccid3hctx_t_ipi);
|   		break;
|   	case TFRC_SSTATE_NO_FBACK:
|   	case TFRC_SSTATE_FBACK:
|  @@ -361,6 +356,7 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
|   			return delay / 1000L;
|   
|   		ccid3_hc_tx_update_win_count(hctx, &now);
|  +		ccid3_update_send_time(hctx);
|   		break;
|   	case TFRC_SSTATE_TERM:
|   		DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
|  @@ -371,9 +367,6 @@ static int ccid3_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
|   	dp->dccps_hc_tx_insert_options = 1;
|   	DCCP_SKB_CB(skb)->dccpd_ccval = hctx->ccid3hctx_last_win_count;
|   
|  -	/* set the nominal send time for the next following packet */
|  -	timeval_add_usecs(&hctx->ccid3hctx_t_nom, hctx->ccid3hctx_t_ipi);
|  -
|   	return 0;
|   }
|   
|  @@ -486,6 +479,11 @@ static void ccid3_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
|   			hctx->ccid3hctx_x    = scaled_div(w_init << 6, r_sample);
|   			hctx->ccid3hctx_t_ld = now;
|   
|  +
|  +			/* adjust down from initial 1 second now we have rtt */
|  +			timeval_sub_usecs(&hctx->ccid3hctx_t_nom,
|  +			   hctx->ccid3hctx_t_ipi);
|  +
|   			ccid3_update_send_time(hctx);
|   
|   			ccid3_pr_debug("%s(%p), s=%u, MSS=%u, w_init=%u, "
|  -
|  To unsubscribe from this list: send the line "unsubscribe dccp" in
|  the body of a message to majordomo@xxxxxxxxxxxxxxx
|  More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  
|  
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [IETF DCCP]     [Linux Networking]     [Git]     [Security]     [Linux Assembly]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux