CCID 3: Strategy for removing division bugs and division errors

Gerrit Renker <gerrit@xxxxxxxxxxxxxx> · Wed, 29 Nov 2006 15:15:46 +0000

As promised, I sat down and wrote up conditions and current problems with
some of the calculations in the CCID 3 module. Changes are required only for the sender.

In a nutshell, the strategy solves
 * the problem that dividing-by-seconds is not handled correctly
 * defines a sufficient resolution for tracking sending rates
 * gets rid of the overflow problem 

The cost is that 
 * at least 1 and ideally 3 structure elements of tfrc_tx_info make it to __u64
 ===> Arnaldo are you OK with this?

but this has the advantage of tackling both overflow and the missing resolution of
lower send rates.

The detailed derivation is below, I am putting this into patches at the moment. It
would be good to have this codified before Ian starts testing. 

       Reasonable assumptions for and a corresponding scheme for the calculation of CCID 3 rates
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Problem:
 ========
  The CCID 3 specification uses bytes, seconds, and bytes/second as units. Additionally, the
  timestamp options (Timestamp, Timestamp Echo, and Elapsed Time) require a representation of time
  in units of 10 * microseconds. The kernel internally uses microseconds to represent time. 
  This results in many divisions of the type `s * 1E6 / t' where s is a packet size in bytes and
  t is a time unit in microseconds. 

  (a) Problem 1: Overflow
      -------------------
      Since currently receive/sending rates are stored as u32, this leads to overflow. 
      There are several  ways of tackling this:
             * the patch by Nishida-san implements a subset of rational arithmetic where
               all numbers are expressed as a 
                           struct  fix-point { long long num; 
                                              long long denom;
                           }; 
                The problem with this is that it has to be carried through for each single
                elementary operation (+,-,*,==,>,<,/,%), which is messy.

             * the current Linux implementation uses a function 

                       usecs_div(u32 a, u32 b)   =    a * 1E6 / b

               which computes a divisor in such a way that u32 overflow is avoided for a. 
               While this goes a some way towards avoiding overflow, it comes at a price: 
                   since a piecewise-linear function is used, a constant is substituted
                   for many different values. The result is a substantial computation error 
                   - this can be as high as 100% (I have a proof and test program). 

  (b) Problem 2: What to do with non-microsecond time units
      -----------------------------------------------------
      There are a few computations and constants in RFC 3448 and RFC 4342 which are not based
      on microsecond time units. The following table lists these    

      s/t_mbi			4.3	bytes/second		0 if s < 64
      s/(2*t_mbi)		4.4	bytes/second		0 if s < 32
      X_recv = X_calc/4		4.4	bytes/second		0 if X_calc < 4 bytes/second
      X/2			4.4	bytes second

      These expressions are currently not handled correctly - this is a BUG in CCID 3. 

 Solution:
 =========
  A brute-force idea is to scaled everything by 1E6 and use bytes/microsecond (or 1e6 bytes/second)
  as basis for sending rates. But this introduces overflow for many calculations - even on 64 bit.

  i)  Minimal granularity for sending/receiving rate
  --------------------------------------------------
      A high granularity is not necessary:

        * [RFC 3448, 4.3] specifies the minimum sending rate as 1 packet in 64 seconds

        * the packet size `s' is never smaller than 1 byte 
                             and never larger than  2^32 - 1  (4 GByte), since this is 
            (a) the largest size of an IPv6 jumbogram (RFC 2765) and
            (b) the type limit of the current representation
        * hence,
			1/64  <=    X   <= 2^32 - 1 

      It is therefore fully sufficient to store X (X_recv, X_calc) in units of 64bytes/second.
      Internet speeds are growing fast, having a resolution of speeds higher than 1/64 is thus
      a truly wasted effort. 

 ii)  Implications for storage size
 -----------------------------------
      Currently X, X_calc, X_recv are all stored as u32. Due to the above this is no longer 
      sufficient: we are now at least 6 bits short.

        * the receiver does not necessarily need a modification since it sends X_recv as u32 anyway

        * at the sender it is however better to scale all X quantities. It is like enlarging a
          picture - everything needs to keep its proportions.

        * hence the changes will look like:
            struct tfrc_tx_info {
			__u64 tfrctx_x;			/* this is really required */
			__u64 tfrctx_x_recv;		/* would be better to have as 64-bit */
			__u64 tfrctx_x_calc;		/* 	ditto 		   */
			/* ... */
	     };

        * the advantages are that
               (a) usecs_div is now much simpler
               (b) it would mean a headache and a mess (but it is possible) to keep the sending
                   rate in different sizes (scaled/unscaled); but with the above we don't need to
               (c) u64 is sufficient even when scaling both by 64 and by 1E6

 iii) Implications for algorithm
 -------------------------------
      This is a clerical job but needs to be done systematically: at the sender, all operations 
      and comparisons have to be changed with regard to the scaled format. This is now done below.
      For the receiver, no changes are required.

      * Initialisation [RFC 3448, 4.2]
          Set X to 64 * 1 packet / second   (i.e. X = s << 6)

      * Sender behaviour when feedback is received [RFC 3448, 4.3 - step (4)]

          ==> When X_recv is computed set
            X_recv = hctx->packet_options_x_recv << 6;

          If (p > 0) {
            X_calc = calcX(R, s, p);             /* we leave that as it is for the moment */
            X_calc << 6;                         /* scale by 64                           */
            X  =  max(min(X_calc, 2*X_recv),     /* X_recv is also scaled */
                      (s << 6) / t_mbi)     );   /* now it is correct     */   

          } Else If(t_now - tld >= R) {
               X   = max(2 * min(X, X_recv), scaled_div(s << 6, R);    /* now also works */
               tld = t_now
          } 

          Where the analogue of usecs_div now is renamed into:

               u64 scaled_div(u64 size, u32 time)  { return (size * 1000000) / time; }

        ==> step (5): the nofeedback timer now expires in 

               max(4 * R, 2 * scaled_div(s, X >> 6))     /* to obtain microseconds */

            This use is why I would like to suggest a renaming into scaled_div - we are not
            dividing by microseconds here.

      * Expiration of nofeedback timer [RFC 3448, 4.4]

        (a) If the sender already has got some feedback:

              If (X_calc > 2 * X_recv)                          /* comparison is in proportion */
                  X_recv = max(X_recv/2, (s << 6)/(2*t_mbi));   /* now works as expected */
              Else                    
                  X_recv = X_calc/4

              ==> expire the nofeedback timer after 
                  max(4 * R, 2 * scaled_div(s, X >> 6))         /* to obtain microseconds */

        (b) If the sender does not yet have any feedback then

              X = max(X/2, (s << 6)/t_mbi)                      /* now also works as expected */

              ==> expire the nofeedback timer after 2 seconds

      * Preventing oscillations [RFC 3448, 4.5]

        This is currently not supported, but the solution is forward-compatible, since the scaled
        X can be kept as it is.

      * Scheduling of Packet Transmissions [RFC 3448, 4.6]

	        t_ipi   =   scaled_div(s, X >> 6)              /* yes, microseconds */

      * Larger initial windows [RFC 4342, sec. 5]

                w_init = min(4 * s, max(2 * s, 4380))
                X      = scaled_div(w_init << 6, R)

That's all !
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html