On Tue, 2006-07-04 at 15:29 +0200, Patrick McHardy wrote:
> Unfortunately I still didn't got to cleaning them up, so I'm sending
> them in their preliminary state. Its not much that is missing, but
> the netem usage of skb->cb needs to be integrated better, I failed
> to move it to the qdisc_skb_cb so far because of circular includes.
Cleanups aside, architecturally the bulk of your patch
looks like a no-brainier to me. The calculation of
packet length should be in one place. Caching it in
skb->cb was a nice touch.
> But nothing unfixable. I'm mostly interested if the current size-tables
> can express what you need for ATM, I wasn't able to understand the
> big comment in tc_core.c in your patch.
Unfortunately you do things in the wrong order for ATM.
See: http://mailman.ds9a.nl/pipermail/lartc/2006q1/018314.html
for an overview of the problem, and then the attached email for
a detailed description of how the current patch addresses it.
It is a trivial fix.
As I said earlier, RTAB and STAB contain the same numbers,
just scaled differently. The ATM patch stuffed around with
RTAB. With your patch in place it will have to do the same
exactly the same thing with STAB - because RTAB and STAB
carry the same data. So to me the two patches seem
orthogonal.
One observation is the size optimisation you applied to STAB,
making it variable length, could also be applied to RTAB.
In fact it should be. Then they would be identical, apart
from the scaling. Even the lookup operation (performed in
qdisc_init_len in your patch) would be identical.
However, now you lot have made me go away and think, I have
another idea on how to attack this. Perhaps it will be
more palatable to you. It would replace RTAB and STAB with
a 28 byte structure for most protocol stacks - well all I can
think of off the top of my head, anyway. RTAB would have to
remain for backwards compatibility, of course.
--- Begin Message ---
Explanation of rate calculation code.
tc_core.c lines 53..61, tc_align_to_cells():
Does the same thing as you function of the same name.
Code is intended to be semantically equivalent.
tc_core.c lines 145..160, in tc_calc_ratespec:
There are two material differences to your code. Both
are in this line:
int sz = ((i+1)<<cell_log) - 1 + overhead;
The first is that where I add the overhead, you don't.
This is because I am calculating the rate table so it
works as well as close to optimal as possible on an
unpatched kernel. Given the background in your draft
email, I assume I don't have to prove you that it does
in fact calculate the best possible table for an
unpatched kernel.
The second change is I use:
((i+1)<<cell_log) - 1
Whereas you had:
((i+1)<<cell_log)
The number calculated by this expression:
((i+1) << cell_log)
is the shortest packet length rtab[(i+1)] applies to.
Ergo, that number minus 1 must be the longest packet
length the previous rtab entry, rtab[i], applies to.
So the intent of the change is that in unpatched kernels,
rtab[i] will contain the time to send the longest
packet that uses it. I suspect your code ends up doing
the same thing on kernels that add overhead to the
packet length.
There should be no other semantic differences between and
how you and I calculate the rate table. If so this should
be sufficient to show that if your code calculated an ideal
rate table for kernels that added the overhead to the
packet length before looking up the rate table, my code
should calculate the best possible table for unpatched
kernels. It won't be ideal because some table entries will
apply to packets using different numbers of ATM cells.
This is unavoidable. But where that happens my code will
always be conservative, ie use the longest packet length.
I think backward the compatibility increases the odds of
the patch being accepted. The next step is to patch the
kernel it can get 100% accurate results by using this
table.
Consider this "diagram":
Overhead 0
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 39 40 41 42 43 44 45 46 47 48 49 50
rtab[skb->len/8]: 1 1 1 1 1 1 1 1 1 x2 2 2
Each column is for a particular packet length. The rows
show the packet length as seen by a layer of the protocol.
The top row, "ATM Payload", contains packet length seen
by the ATM layer - ie the ATM payload. It includes all
overheads bar cell overhead and padding. It is always
equal to: skb->len + overhead.
The second row, "skb->len" contains the packet seen by
the kernel. It is just the figure in the first line
less the "overhead" passed to tc.
The third line shows how "tc" will calculate the rate
table entry for the packet length. It contains the
number of ATM cells "tc" thinks the packet will occupy.
I am assuming a cell_log of 3 here, meaning each rate
table entry is used for 8 different packet lengths.
In an unpatched kernel a single rate table entry may
apply to packets which use different numbers of ATM
cells. Since the rate table entry is a single number
it cant be right for all packet lengths when that
happens. If faced with this ambiguity tc uses the number
of cells occupied by the longest packet the rate table
entry applies to. This means the rate table will be wrong
for the shorter packets, and this is highlighted in the
diagram an "x".
Thus we see from the table that for a 39 byte packet tc
thinks it will take 1 cell to send. But for a 48 byte
packet the table says it will take "x2" packets. The "x"
means it will really take 1 packet, but because this rate
table entry is also used by 49..55 byte packets tc used
the bigger value for that rate table entry.
We can fix the errant "x2" entry by sliding the rate
table along so it sits under the smallest packet length
it could apply to (in this case 49). To achieve that
the rate table must be slide one byte to right. This
can be achieved by adding an "alignment" figure of -1
to the packet length before looking up the rate table.
Here is the same diagram as above, but with two
additional lines added showing the effect of adding the
alignment:
Overhead 0
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 39 40 41 42 43 44 45 46 47 48 49 50
rtab[skb->len/8]: 1 1 1 1 1 1 1 1 1 x2 2 2
skb->len-1: 38 39 40 41 42 43 44 45 46 47 48 49
rtab[(skb->len-1)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
The rows "skb->len-1" and "rtab[(skb->len-1)/8]" are
copies of the rows above that have been moved slightly.
They have to be copies, as the rtab computed by tc hasn't
changed, so for example 47 must still yield 1 cell, and
48 must still yield 2 cells.
Below is the same diagram repeated for overheads 1 .. 9.
The are all pretty much the same as the one above. The
only changes is the increasing overhead, which in turn
changes the number of columns in error. The alignment
changes to compensate. You will see it follows a
straightforward pattern as the overhead increases.
Overhead 1
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 38 39 40 41 42 43 44 45 46 47 48 49
rtab[skb->len/8]: 1 1 1 1 1 1 1 1 1 1 2 2
skb->len-0: 38 39 40 41 42 43 44 45 46 47 48 49
rtab[(skb->len-0)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
Overhead 2
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 37 38 39 40 41 42 43 44 45 46 47 48
rtab[skb->len/8]: 1 1 1 x2 x2 x2 x2 x2 x2 x2 2 2
skb->len-7: 30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-7)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
Overhead 3
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 36 37 38 39 40 41 42 43 44 45 46 47
rtab[skb->len/8]: 1 1 1 1 x2 x2 x2 x2 x2 x2 2 2
skb->len-6: 30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-6)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
Overhead 4
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 35 36 37 38 39 40 41 42 43 44 45 46
rtab[skb->len/8]: 1 1 1 1 1 x2 x2 x2 x2 x2 2 2
skb->len-5: 30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-5)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
Overhead 5
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 34 35 36 37 38 39 40 41 42 43 44 45
rtab[skb->len/8]: 1 1 1 1 1 1 x2 x2 x2 x2 2 2
skb->len-4: 30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-4)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
Overhead 6
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 33 34 35 36 37 38 39 40 41 42 43 44
rtab[skb->len/8]: 1 1 1 1 1 1 1 x2 x2 x2 2 2
skb->len-3: 30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-3)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
Overhead 7
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 32 33 34 35 36 37 38 39 40 41 42 43
rtab[skb->len/8]: 1 1 1 1 1 1 1 1 x2 x2 2 2
skb->len-2: 30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-2)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
Overhead 8
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 31 32 33 34 35 36 37 38 39 40 41 42
rtab[skb->len/8]: 1 1 1 1 1 1 1 1 1 x2 2 2
skb->len-1: 30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-1)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
Overhead 9
----------
ATM Payload: 39 40 41 42 43 44 45 46 47 48 49 50
skb->len: 30 31 32 33 34 35 36 37 38 39 40 41
rtab[skb->len/8]: 1 1 1 1 1 1 1 1 1 1 2 2
skb->len-0: 30 31 32 33 34 35 36 37 38 39 40 41
rtab[(skb->len-0)/8]: 1 1 1 1 1 1 1 1 1 1 2 2
Notice that at overhead 8 has an alignment of -1, and this
is the same alignment as overhead 0 had. This is because
the errors have moved through one rate table entry and are
now into the next one, so the "fix" is the same.
We can draw up a table showing the required alignment for
each overhead, like this:
overhead | alignment
---------+----------
0 | -1
1 | 0
2 | -7
3 | -6
4 | -5
5 | -4
6 | -3
7 | -2
We only need 8 elements, as once we get to 8 (or more
accurately 1<<cell_log) it repeats. If the kernel
adds the alignment figure from the table to the packet
size before looking up the rate table, then it will
always get accurate results from the optimal table for
the unpatched kernel.
The alignment figure is calculated by "tc" and passed to
the kernel, much as you passed the "overhead" to the
kernel in your code. In my case it will always be small.
And finally, we come to ......
tc_core.c line 122, in tc_calc_cell_align. The expression
I use to calculate the alignment is:
return (overhead + cell_size - 2) % cell_size - cell_size + 1
You could try and mathematically prove it does yield the
table above, but since there are only 8 values that
matter (the % in the expression ensures it repeats after
that), it much easier to just plug in the overheads 0..7
and verify it does come up with the correct alignment.
QED, for a cell_log of 3 anyway.
--- End Message ---
_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc