Re: [PATCH 1/1] DCCP: Fix up t_nom - FOLLOW-UP

Eddie Kohler <kohler@xxxxxxxxxxx> · Thu, 11 Jan 2007 08:52:05 -0800

Summary: I agree with Ian that hrtimer support is not required, and that 
bursts are OK.  They are explicitly allowed by the RFC.

  When a sender first starts sending at time t_0, it calculates t_ipi,
  and calculates a nominal send time t_1 = t_0 + t_ipi for packet 1.
  When the application becomes idle, it checks the current time, t_now,
  and then requests re-scheduling after (t_ipi - (t_now - t_0))
  seconds.  When the application is re-scheduled, it checks the current
  time, t_now, again.  If (t_now > t_1 - delta) then packet 1 is sent.

Note that initially we set t_ipi to 1 second. This could be set to a
better value based on connection setup as per Eddie and your
discussion earlier but I haven't implemented this yet. In this way my
code is a hack that I remove the 1 second and add the initial RTT once
we obtain it. I see this ugliness can be removed when we make the code
base conform to the RFC intent (it is not in RFC yet but Eddie said he
would propose for revision)

For what it's worth, it's as close to in the RFC as it can get without a 
revision.  The authors of the RFC agree that we meant the initial 
Request-Response RTT to be usable as an initial RTT estimate; the 
working group agreed; errata has been sent.

  Now a new t_ipi may be calculated, and used to calculate a nominal
  send time t_2 for packet 2: t2 = t_1 + t_ipi.  The process then
  repeats, with each successive packet's send time being calculated
  from the nominal send time of the previous packet.

  In some cases, when the nominal send time, t_i, of the next packet is
  calculated, it may already be the case that t_now > t_i - delta.  In
  such a case the packet should be sent immediately.  Thus if the
  operating system has coarse timer granularity and the transmit rate
  is high, then TFRC may send short bursts of several packets separated
  by intervals of the OS timer granularity.

Note a couple of phrases here "the packet should be sent immediately",
"TFRC may send short bursts of several packets separated by intervals
of the OS timer granularity". This is why I don't think we should ever
reset t_nom to the current time. Doing this stops us achieving the
average rate required. With reseting t_nom what we are doing is
setting X as the maximum instantaneous rate rather than the average
rate we are trying to achieve.

Absolutely agree.

    Therefore I wonder if there is some kind of `micro/nanosleep' 
which we can use?
    Did some grepping and inevitably landed in kernel/hrtimers.c - any 
advice on
    how to best deploy these?

    On healthy links the inter-packet times are often in the range of 
multiples of
    10 microseconds (60 microseconds is frequent).

I don't believe we need to do this from a practical point of view as
the RFC says we aim for average packet rate rather than precise
timing. It is interesting from a research point of view whether we
achieve better results from smoother packet flow given precise timing,
and I have seen published literature on this effect. It is not
necessary for RFC compliance though.

Absolutely agree.  Whatever is easiest.

However if you want to implement for research that is fine provided
that we satisfy a couple of criteria:
- negligible performance impact. I would have thought that hrtimer
would add to overhead a reasonable amount. Is the point of hrtimers to
achieve a large number of timers per second with low overhead or does
it allow you to precisely schedule an event at a time you precisely
require? By my calculations using CCID3 on a 1 GBits/sec link using 1
hrtimer per packet would mean around 90,000 timer fires per second vs
a maximum of 1000 timeslices if we are using standard setup and HZ of
1000. Interrupting other operations 90,000 times per second is surely
not good??
- cross platform support. I'm not sure if platforms like ARM support
hrtimers. I want to see CCID3 used across many platforms and small
embedded devices based on ARM would be a key market (your cellphone
for video calls for example).

Based on this I think hrtimer support is not needed but if it is done
then it should be an option not a requirement.

Absolutely agree squared.

Eddie

(B) Fixing send time for packets which are too late

    You were mentioning bursts of packets which appear to be too late. 
I consulted
    with a colleague of how to fix this: the solution seems much more 
complicated
    than the current infrastructure supports.
    Suppose the TX queue length is n and the packet at the head of the 
queue is too
    late. Then one would need to recompute the sending times for each 
packet in the
    TX queue by taking into account the tardiness (it would not be 
sufficient to
    simply drain the queue). It seems that we would need to implement 
a type of
    credit-based system (e.g. like a Token Bucket Filter); in effect a 
kind of Qdisc
    on layer 4.

When using only a single t_nom for the next-packet-to-send as we are 
doing at the moment,
resetting t_nom to t_now when packets are late seems the most sensible 
thing to do.

So what I think we should do is fix the packet scheduling algorithm to 
use finer-grained
delay. Since in practice no one really is interested in speeds of 
1kbyte/sec and below,
it in effect means that we are not controlling packet spacing at all.

See above.

-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html