(list admin, please cancel the same post from my other email address -- forgot to change it on first submission) I need to setup QoS on a linux router/firewall I maintain. I spent 10 hours reading everything I could find on QoS/HTB/iproute2 and came up with what I thought made sense for my situation. So I deployed it and BOOM! KERNEL PANIC! Not what I was expecting... now the debugging begins. I reproduced the panic twice on two different (yet almost identically configured) machines. I can reproduce the panic on demand by doing a specific set of actions. First, my setup: I have 2 machines at different locations connected via internet. Both machines are stock Fedora Core 1 kernel 2.4.22-1.2179.nptl. I run free/SWAN (stock FC binary rpm's) between the 2 machines for ipsec VPN. I run VoIP, VNC and all other inter-office traffic through the VPN. The internet connection is ADSL with 400kbits/s up and 1500 or so down. VoIP is routed but not MASQ'd. VNC is MASQ'd (neither the originating nor destination machines are the linux boxes themseleves). Second, my goals: Give a fixed minimum bandwidth and high priority to VoIP through VPN. Same, but less so, for VNC through VPN. Give the VPN high enough allocation for VoIP and VNC to get through ok. Less important little tweaks for rarely-used outside (non IPSEC) VNC and ssh access. My situation seems different from the examples I've seen because *I believe* I need to have 2 completely separate qdiscs, 1 for ppp0 (the DSL) and 1 for ipsec0 (the freeSWAN VPN). Yet ipsec0 eventually goes over ppp0 so they are intertwined. I have a funny feeling this is where the crash is coming from. See my setup script near the bottom of this email (excuse the wrapping). Everything seemed to go great until I tried VNC'ing in from one office to the other. The VNC screen would pop up, do a first draw, then completely freeze. From that point on the remote linux router is frozen -- kernel panic. Strange that the bug would only trigger AFTER sending the 100-200kB of the initial VNC screen. Looking at my config, I will note a couple of questions I had while writing it that weren't answered in the docs I found: 1. The "tc filter add ... protocol ip" thing confused me. What exactly is the "protocol ip" for? I originally though that it should read "protocol 50" for the ipsec stuff, but that didn't seem to catch the packets, so I switched it back to "ip". Weird, while testing with it set to 50 (and having no packets match the rule) there were no crashes. 2. The iptables mangle rules will in the case of VNC and ssh *over VPN* match two rules. I *assume* the last executing MARK will overwrite the previous MARK. If for some reason the marks are ANDed or something, perhaps that is causing the crash (filtering 1 packet into 2 buckets?). 3. As I mentioned above, the fact that one qdisc will feed a separate qdisc, because ipsec0 eventually goes out over ppp0, may be a problem? I wish I had seen some examples of this type of setup. 4. I chose HTB instead of CBQ as it seemed simpler (always a good thing) and more suited to my exact needs. Not sure if the bug is in HTB itself or the general QoS stuff. my setup script: $iext=ppp0 $isec=ipsec0 $ivoi=eth3 $qosbw=380 # VNC iptables -t mangle -A PREROUTING -p tcp --sport 5900 -j MARK --set-mark 11 iptables -t mangle -A PREROUTING -p tcp --dport 5900 -j MARK --set-mark 11 iptables -t mangle -A PREROUTING -i $ivoi -j MARK --set-mark 10 iptables -t mangle -A OUTPUT -p 50 -j MARK --set-mark 10 iptables -t mangle -A OUTPUT -p 51 -j MARK --set-mark 10 iptables -t mangle -A OUTPUT -o $iext -p tcp --sport ssh -j MARK --set-mark 12 tc qdisc del dev $isec root >/dev/null 2>&1 tc qdisc add dev $isec root handle 1:0 htb default 13 tc class add dev $isec parent 1:0 classid 1:1 htb rate "$qosbw"kbit ceil "$qosbw"kbit tc class add dev $isec parent 1:1 classid 1:10 htb rate 160kbit ceil "$qosbw"kbit tc class add dev $isec parent 1:1 classid 1:11 htb rate 210kbit ceil "$qosbw"kbit tc class add dev $isec parent 1:1 classid 1:13 htb rate 010kbit ceil "$qosbw"kbit tc qdisc add dev $isec parent 1:10 handle 110:0 sfq perturb 10 tc qdisc add dev $isec parent 1:11 handle 111:0 sfq perturb 10 tc qdisc add dev $isec parent 1:13 handle 113:0 sfq perturb 10 tc filter add dev $isec parent 1:0 protocol ip handle 10 fw flowid 1:10 tc filter add dev $isec parent 1:0 protocol ip handle 11 fw flowid 1:11 tc qdisc del dev $iext root >/dev/null 2>&1 tc qdisc add dev $iext root handle 1:0 htb default 13 tc class add dev $iext parent 1:0 classid 1:1 htb rate "$qosbw"kbit ceil "$qosbw"kbit tc class add dev $iext parent 1:1 classid 1:10 htb rate 300kbit ceil "$qosbw"kbit tc class add dev $iext parent 1:1 classid 1:11 htb rate 050kbit ceil "$qosbw"kbit tc class add dev $iext parent 1:1 classid 1:12 htb rate 020kbit ceil "$qosbw"kbit tc class add dev $iext parent 1:1 classid 1:13 htb rate 010kbit ceil "$qosbw"kbit tc qdisc add dev $iext parent 1:10 handle 110:0 sfq perturb 10 tc qdisc add dev $iext parent 1:11 handle 111:0 sfq perturb 10 tc qdisc add dev $iext parent 1:12 handle 112:0 sfq perturb 10 tc qdisc add dev $iext parent 1:13 handle 113:0 sfq perturb 10 tc filter add dev $iext parent 1:0 protocol ip handle 10 fw flowid 1:10 tc filter add dev $iext parent 1:0 protocol ip handle 11 fw flowid 1:11 tc filter add dev $iext parent 1:0 protocol ip handle 12 fw flowid 1:12 The info dumped on-screen from the kernel panic. I couldn't find any way to scroll up and didn't have sysrq enabled and didn't have the ability to enable and reproduce (system was being used live in production during business hours!). I could potentially go back off-hours and reproduce with sysrq and get more info (hopefully). There may be slight typos as this was manually copied to paper and then back into this email! ... anything above not visible... eax 0 wbx 8 ecx 1 edx d741c001 esi c0384000 edi 0 ebp d741c009 esp c0385ef8 ds 68 es 68 ss 60 process (pid 0 stackpage c0385000) stack ddfc9000 ddfc9244 c0385f40 0 1 c0385f8c 0 c0125e93 c038400 0 1 0 c0385f8c 20000001 c0126278 0 c010ed90 c0385f8c c033ddb0 20000001 c010b1a5 0 0 c0385f8e call trace c0125e93 update_process_times [k] 0x33 0xc0385f14 c0126278 do_timer [k] 0x28 0xc0385f30 c010ed90 timer_interrupt [k] 0x80 0xc0385f38 c010b1a5 handle_IRQ_event [k] 0x45 0xc0385f48 c010b324 do_IRQ [k] 0x64 0xc0385f68 c010db28 call_do_IRQ [k] 0x05 0xc0385f88 c0110068 restore_i387 [k] 0x28 0xc0385fa8 c0106fb3 default_idle [k] 0x23 0xc0385fb4 c0115a7c apm_cpu_idle [k] 0xac 0xc0385fc0 c01159d0 apm_cpu_idle [k] 0x00 0xc0385fc4 c0107032 cpu_idle [k] 0x32 0xc0385fd4 c0105000 stext [k] 0x00 0xc0385fe0 code 01 b8 d0 01 00 00 01 88 d4 01 00 00 b8 1f 85 eb 51 89 96 c4 <0> kernel panic: Aiee, killing int handler! In interrupt handler not syncing Lastly, thanks! _______________________________________________ LARTC mailing list / LARTC@xxxxxxxxxxxxxxx http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/