Kernel panic when using wrr qd

Maciej Szymański <shiman@xxxxxxxxxxxx> · Thu, 3 Feb 2005 01:59:42 +0100

For some time now im trying to get to know what causes such
configuration of QOS (kernel 2.4.25-2.4.28 + IMQ patch and ofcourse wrr):

--
WRR_MAX_CLASSES=200
DEV_IN=imq0
ifconfig $DEV_IN down
ifconfig $DEV_IN up
tc qdisc add dev ${DEV_IN} handle 1:0 root htb default 10
tc class add dev ${DEV_IN} parent 1:0 classid 1:10 htb rate 1000kbit burst
1kbit prio 1
tc qdisc add dev ${DEV_IN} parent 1:10 handle 10: htb default 190
tc class add dev ${DEV_IN} parent 10: classid 10:10 htb rate 1000kbit
burst 20kbit prio 1
tc class add dev ${DEV_IN} parent 10:10 classid 10:190 htb rate 500kbit
ceil 900kbit burst 10k prio
tc qdisc add dev ${DEV_IN} parent 10:190 handle 190: wrr dest ip
$WRR_MAX_CLASSES 0
###tc qdisc add dev ${DEV_IN} parent 10:190 handle 190:0 esfq perturb 10
limit 64 depth 64 divisor 11

echo "IMQ dev 0 start"
/sbin/iptables -F -t mangle
/sbin/iptables -A PREROUTING -t mangle -i $DEV_OUT -j IMQ --todev 0
--

Runnig the above and testing it with random packets generated by Nemesis
(http://nemesis.sourceforge.net) results in Kernel Panic with random
process pid.

The packet generation script (destination adresses altered).

File router_killer:
--
#!/usr/bin/perl
$port=9898;
for ($c = 1; $c < 100; ++$c) {
   for ($i = 1; $i < 254; ++$i) {
      `nemesis tcp -v -S 222.65.61.222 -D 80.50.30.$i -fSA -w 64800 -T
128 -y $port `;
      `nemesis tcp -v -S 222.65.61.222 -D 80.50.31.$i -fSA -w 64800 -T
128 -y $port `;
   }
}
exit(0);
--

To kernel panic to occur the subnets 80.50.30.0/24, 80.50.31/24 must be
routed to a router witch runs IMQ.

Generally speaking it looks as the wrr algorithm from the implementation
found at http://wipl-wrr.dkik.dk/wrr/ does not handle larger amounts of
IP adresses generated in a short period of time.

The tests we ran showed up that the problem lies (at 99%) in the WRR.
After running the "router_killer" script in a few seccond the kernel panics.
Changeing the queuing algorithm from WRR to ESFQ (in fact commenting the
first line of script 1) helps - the router is able to work properly
under heavy load (simulated attack). That's how i've come to the conclusion
that the problem lies in the WRR itself.

Sources of the algo were modified on many ways, everything ending up
with the same - kernel panic. From observations we can assume that this
is not the problem of filling up the queue because the algorithm should
handle it pretty well. If an packet does not fit the queue it gets
dropped and also the whole communication afterwards till the queue gets
freed. And as we know the queue is controlled by the Qdisc structure and
it's mechanisms and not the WRR

   if ((retvalue=qdisc->enqueue(skb, qdisc)) == ENQUEUE_SUCCESS) {
   // Successfull
   sch->stats.packets++;
   ...

   if(retvalue!=ENQUEUE_SUCCESS) {
   // Packet not enqued:
   sch->stats.drops++;

It does not look that the queue could "fill up".

Can somebody help (the problem is interresting, im thinking bout posting
it on lartc...)?

PS1.
I've not checked the wrr device implemetation because it's beeing used
as a class not as a queuing discipine. Will not work for me.

PS2.
HTB parameters (rate, ceil) do not matter as the system screws up with
every possible combination.

_______________________________________________
LARTC mailing list / LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

Kernel panic when using wrr qd

Linux Advanced Routing and Traffic Control