On Tuesday 02 October 2012 07:06:03 Adrian Chadd wrote: > Hm, there are still issues on Hornet? Yes, we still have problems with hornet. The issue I am trying to "fix" with this patch is an interrupt storm on AR9330 devices with sta interface(s). Random devices crash after getting a stacktrace reporting __report_bad_irq. The crash either results in a reboot or hang of the device [ 952.950000] irq 2: nobody cared (try booting with the "irqpoll" option) [ 952.950000] Call Trace: [ 952.950000] [<8026ade8>] dump_stack+0x8/0x34 [ 952.950000] [<800a75d0>] __report_bad_irq+0x44/0xf4 [ 952.950000] [<800a78ec>] note_interrupt+0x200/0x2a4 [ 952.950000] [<800a58c8>] handle_irq_event_percpu+0x19c/0x1e0 [ 952.950000] [<800a86cc>] handle_percpu_irq+0x54/0x88 [ 952.950000] [<800a501c>] generic_handle_irq+0x3c/0x4c [ 952.950000] [<80064748>] do_IRQ+0x1c/0x34 [ 952.950000] [<80062d6c>] ret_from_irq+0x0/0x4 [ 952.950000] [<8007673c>] tasklet_action+0xb8/0xd4 [ 952.950000] [<80076c24>] __do_softirq+0xa0/0x154 [ 952.950000] [<80076e30>] do_softirq+0x48/0x68 [ 952.950000] [<80076f94>] local_bh_enable+0x94/0xb0 [ 952.950000] [<83406d60>] cfg80211_scan_done+0x670/0x6d0 [cfg80211] [ 952.950000] [ 952.950000] handlers: [ 952.950000] [<83564d48>] ath_isr [ 952.950000] Disabling IRQ #2 The test setup is using 30 AR9330 devices running OpenWRT 32727/33559. 32727 is using compat-wireless-2012-04-17 (+ many OpenWRT patches) and 33559 is running compat-wireless-2012-09-07 (+many more patches from Felix). 1 device is running an open AP device (standard OpenWRT settings) and 29 devices are trying to connect. Random devices will now fail. To debug this problem, I used one devices with 8 vif devices and restarted the network script again and again to force the recreation of the vif and reconnect. The stack trace doesn't seem to be very helpful. Therefore, I checked ath_isr and noticed that the interrupts right before the device crash get the status 0 from ar9003_hw_get_isr. Digging a little but further also revealed that the interrupts in the interrupt storm also have async_cause 0 and sync_cause 0x20. This sync cause 0x20 isn't handled anywhere and may be the cause of the hang/crash. At least this is the symptom which can be fixed without crashing the system. I hope that helps to track down the problem. Kind regards, Sven
Attachment:
signature.asc
Description: This is a digitally signed message part.