Re: [PATCH 2/25]: Avoid accumulation of large send credit

Gerrit Renker <gerrit@xxxxxxxxxxxxxx> · Sun, 15 Apr 2007 16:44:46 +0100

Hi Eddie,

this email is confused and angry so before even starting with the facts can I just
apologize for having asked you not to send any offline emails. That was probably
a bad thing to do, sorry.

With that out of the way, can we please have a cooler view at the facts.

|  Gerrit.  I know the implementation is broken for high rates.  But you are 
|  saying that it is impossible to implement CCID3 congestion control at high 
|  rates.  I am not convinced.  Among other things, CCID3's t_gran section gives 
|  the implementation EXACTLY the flexibility required to smoothly transition 
|  from a purely rate-based, packet-at-a-time sending algorithm to a hybrid 
|  algorithm where periodic bursts provide a rate that is on average X.
|  
|  Your examples repeatedly demonstrate that the current implementation is 
|  broken.  Cool.
Unfortunately it is, and I say this without any glee. It was before I started work
on it, and for that matter probably even before Ian converted the code.

The problem is that, due to the slow-start mechanism, the sender will always try
to ramp up to link speed and thus invariably to such small packet spacings that it
can not control.

I didn't say CCID3 is impossible, and I didn't say that your specification was bad.

What I mean to say is that trying to implement the algorithm "exactly and explicitly"
out of the book does not work: on the one hand it ignores the realities of the operating 
system (scheduling granularity, processing costs, inaccuracies, delays) and on the other
hand it ignores realities of networking - as per David's and Ian's answers. 

So the point is merely that the goals of TFRC need to somehow be rephrased in terms of
what can be done sensibly. 

I think that there is a lot of very valuable points to be learned from David's input
and that listening and not listening carefully to such hints can make the key difference
as to whether or not CCID3 works in the real world too.

I really hope that the points raised at the end of last week will somehow be linked with
the TFRC/CCID3 specification.

|  If you were to just say this was an interim fix it would be easier, but I'd 
|  still be confused, since fixing tihs issue does not seem hard.  Just limit the 
|  accumulated send credit to something greater than 0, such as the RTT.  But you 
|  hate that for some reason that you are not articulating.
No sorry, it is not a quick fix. I think it requires some rethinking.  

|  It's here that you go off the rails:
|  
|   > Seriously, I think that Linux or any other scheduler-based OS is simply the
|   > wrong platform for CCID3, this here can not give you the precision and the
|   > controllability that your specification assumes and requires.
|  
|  The specification neither assumes nor requires this and in fact has an 
|  EXPLICIT section that EXACTLY addresses this problem, 4.6 of 3448.
I think that this is `exactly' the problem - we can not meet the goals of that
specification by implementing it `explicitly'. The explicit requirements constrain
the implementation, which itself is constrained by the realities of what can be 
implemented, and what works in a real network. 

By relaxing that explicitness, you would give implementers the freedom to meet the 
goals of your specification. And you would win tremendously from that - especially
when using the input from David or Arnaldo.

|   > CCID2 works nicely since it does not have all these precision requirements.
|  
|  To put it mildly, you have not provided evidence that CCID3 does either.
Oh I did several months ago, all posted to the list and mentioned several times.
The links are
	http://www.erg.abdn.ac.uk/users/gerrit/dccp/docs/packet_scheduling/
	http://www.erg.abdn.ac.uk/users/gerrit/dccp/docs/impact_of_tx_queue_lenghts/

|  Ian: do you want to collaborate on a patch for this?
A patch for a conceptual problem? Please do.

Thanks.

|  Gerrit Renker wrote:
|  > Your arguments consider only the specification. What you don't see, and Ian also doesn't seem
|  > to see, is that this implementation conforms to the ideas of TFRC only up to a maximum speed
|  > of s * HZ bytes per second; under benign conditions this is about 12..15 Mbits/sec.
|  > 
|  > Once you are past that speed you effectively have a `raw socket' module whose only resemblance 
|  > to TFRC/DCCP is the package format; without even a hint of congestion control.
|  > 
|  > Here for instance is typical output, copied & pasted just a minute ago:
|  > 
|  > $ iperf -sd -t20
|  > ------------------------------------------------------------
|  > Server listening on DCCP port 5001
|  > DCCP datagram buffer size:   106 KByte (default)
|  > ------------------------------------------------------------
|  > [  4] local 192.235.214.65 port 5001 connected with 192.235.214.75 port 40524
|  > [  4]  0.0-20.4 sec  1.08 GBytes    454 Mbits/sec                              
|  > 
|  > If you ask the above sender to reduce its speed to 200 Mbits/sec in response to network congestion
|  > reported via ECN receiver or feedback it will _not_ do that - simply because it is unable to control
|  > those speeds. It will continue to send at maximum speed (up to 80% link bandwidth is possible).
|  > 
|  > Only when you ask it to reduce below s * HZ will it be able to slow down, which here would mean
|  > to reduce from 454 Mbits/sec to 12 Mbits/sec.
|  > 
|  > That said, without this patch you will get a stampede of packets for the other reason that
|  > the scheduler is not as precise as required; it will always add up the lag arising from
|  > interpreting e.g. 1.7 as 1 and 0.9 as 0 milliseconds. I still would like this patch in for exactly
|  > these reasons.
|  > 
|  > Seriously, I think that Linux or any other scheduler-based OS is simply the wrong platform for CCID3, 
|  > this here can not give you the precision and the controllability that your specification assumes
|  > and requires. 
|  > 
|  > You are aware of Ian's (and I doubt whether he is the only one) aversions against high-res timers.
|  > This would remove these silly accumulations and remove the need for patches such as this one.
|  > 
|  > The other case is the use of interface timestamps. With interface timestamps, I was able to accurately
|  > sample the link RTT as it is reported e.g. by ping. With the present layer-4 timestamps, this goes up
|  > back again to very high values, simply because the inaccuracies add all up. 
|  > 
|  > Conversely, it very much seems that the specification needs some revision before it becomes implementable
|  > on a non-realtime OS. Can you give us something which we can implement with the constraints we have
|  > (i.e. no interface timestamps, no high-res timers, accumulation of inaccuracies)?
|  > 
|  > CCID2 works nicely since it does not have all these precision requirements.
|  > 
|  > 
|  > 
|  > 
|  > 
|  > Quoting Eddie Kohler:
|  > |  > That is one of the problems here - in the RFC such problems do not arise, but the implementation needs
|  > |  > to address these correctly.
|  > |  
|  > |  The RFC's solution to this problem, which involves t_gran, EXACTLY addresses this
|  > |  
|  > |  > |  Your token bucket math, incidentally, is wrong.  The easiest way to see this 
|  > |  > |  is to note that, according to your math, ANY token bucket filter attempting to 
|  > |  > |  limit the average output rate would have to have n = 0, making TBFs useless. 
|  > |  > |  The critical error is in assuming that a TBF allows "s * (n + R * rho)" bytes 
|  > |  > |  to be sent in a period R.  This is not right; a TBF allows a maximum of s * R 
|  > |  > |  * rho per longer-term period R; that's the point.  A token bucket filter 
|  > |  > |  allows only SHORT-term bursts to compensate for earlier slow periods.  Which 
|  > |  > |  is exactly what we need.
|  > |  > Please take another look. The formula is correct (you will find the same one e.g in Andrew
|  > |  > Tanenbaum's book). 
|  > |  
|  > |  So I assume what you are referring to is the clause "average rate OVER ONE 
|  > |  RTT"?  Sorry I missed that.  I missed it because it is not TFRC's goal.  Can 
|  > |  you point to the section in RFC3448 or RFC4342 that prohibits a TFRC sender 
|  > |  from ever sending a (transient) rate more than X over one RTT?  RFC3448 4.6 
|  > |  allows burstiness much more than a single packet, and the intro allows 
|  > |  fluctuations of up to a factor of 2 relative to the fair rate
|  > |  
|  > |  > I think (with regard to the paragraph below) that your perspective is an entirely different one,
|  > |  > namely to solve the question "which kind of token bucket do we need to obtain an a rate which
|  > |  > is on average consistent with X". 
|  > |  
|  > |  That is *TFRC's* perspective: finding packet sends that on average are 
|  > |  consistent with X.  As demonstrated by 4.6 and elsewhere
|  > |  
|  > |  How much above X may an application transiently send?  The intro would argue 2x.
|  > |  
|  > |  > But until this is truly resolved I want this patch in.
|  > |  
|  > |  Fine, I disagree, Ian disagrees (as far as I read its messages).  You are 
|  > |  fixing one problem and creating another: artificially low send rates
|  > |  
|  > |  Eddie
|  > |  -
|  > |  To unsubscribe from this list: send the line "unsubscribe dccp" in
|  > |  the body of a message to majordomo@xxxxxxxxxxxxxxx
|  > |  More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  > |  
|  > |  
|  -
|  To unsubscribe from this list: send the line "unsubscribe dccp" in
|  the body of a message to majordomo@xxxxxxxxxxxxxxx
|  More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  
|  
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html