Seems solved! Maybe a little premature, but I need to blurt this out... Don't know why, but I think I was not thorough enough in my kernel config: - I have a so-called alix board featuring a Geode LX 800 - I had tried to set processor type to Geode GX/LX, but that did not boot (hangs somewhere) - I didn't bother to find out why, but compiled for 486 instead (worked) - In the meantime, I copied/merged a kernel .config from another branch with processor type = 586/686/etc, which went unnoticed by me, but seemed to work all the time (except maybe for that last kernel BUG) - Now I compiled for Geode GX/LX again, and set CONFIG_GEODE_MFGPT_TIMER=n (as per this info here: https://kerneltrap.org/mailarchive/linux-kernel/2008/1/20/585236) which makes my kernel boot and SEEMS TO MAKE THAT CRASH GO AWAY!!! At least I haven't observed the crash in the last 60 minutes, whereas before it took only 1-2 minutes every time to turn it up. Will test this all day. The three patches mentioned before are applied, and my app-level timeout is still gone, and the "dropped filtered TX" messages are gone as well. Christian, should I actually test your p54-sta-flags-v3 patch? Regards, Stefan. 2008/11/26 Christian Lamparter <chunkeey@xxxxxx>: > On Wednesday 26 November 2008 14:38:59 Stefan Steuerwald wrote: >> console [netcon0] enabled >> netconsole: network logging started >> BUG: unable to handle kernel NULL pointer dereference at 00000038 >> IP: [<d08260fa>] p54_assign_address+0x67/0x14b [p54common] >> *pde = 00000000 >> Oops: 0000 [#1] >> last sysfs file: /sys/class/net/lo/operstate >> Modules linked in: netconsole ipv6 loop evdev ehci_hcd ohci_hcd >> rtc_cmos rtc_core pcspkr rtc_lib p54pci usbcore via_rhine p54common >> geode_aes mii [last unloaded: netconsole] >> >> Pid: 0, comm: swapper Not tainted (2.6.28-rc6-wl #16) >> EIP: 0060:[<d08260fa>] EFLAGS: 00010002 CPU: 0 >> EIP is at p54_assign_address+0x67/0x14b [p54common] >> EAX: cf98b178 EBX: cf86ee40 ECX: 00000000 EDX: 00000000 >> ESI: 000000f8 EDI: 00000000 EBP: 0002027c ESP: c03f9c4c >> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 >> Process swapper (pid: 0, ti=c03f8000 task=c03c4380 task.ti=c03f8000) >> Stack: >> 00000002 ce4d5880 ce4c48b4 cf86e1a0 00000000 00000038 00020200 00000286 >> cf86ee40 00000004 ce4d58b2 ce4d588c d0826fd7 00000090 014c48d4 ce4c48b4 >> cf86e1a0 0086ee40 00000004 02000282 ce4c48d4 cf86ef10 cf86ee40 ce4d5880 >> Call Trace: >> [<d0826fd7>] p54_tx+0x416/0x482 [p54common] >> [<c02fb7c2>] __ieee80211_tx+0x35/0xf8 >> [<c02fc235>] ieee80211_master_start_xmit+0x2ab/0x396 >> [<c01048d3>] common_interrupt+0x23/0x30 >> [<c0297368>] dev_hard_start_xmit+0x16e/0x1c9 >> [<c02a3518>] __qdisc_run+0xa2/0x15c >> [<c0297796>] dev_queue_xmit+0x2f5/0x3c5 >> [<c02f8608>] ieee80211_invoke_rx_handlers+0x488/0x1486 >> [<c02d9d14>] bictcp_cong_avoid+0x10/0x160 >> [<c02bd904>] tcp_ack+0x16f0/0x1850 >> [<c01170f0>] enqueue_task_fair+0x12a/0x16b >> [<c02c0c37>] tcp_current_mss+0x6b/0xe4 >> [<c02f9b50>] __ieee80211_rx_handle_packet+0x54a/0x56d >> [<c02fa1fe>] __ieee80211_rx+0x491/0x4e3 >> [<c02ec95d>] ieee80211_tasklet_handler+0x60/0xd6 >> [<c011cfae>] tasklet_action+0x3e/0x64 >> [<c011d305>] __do_softirq+0x4a/0xbc >> [<c011d399>] do_softirq+0x22/0x26 >> [<c011d44f>] irq_exit+0x25/0x55 >> [<c0105996>] do_IRQ+0x5a/0x6c >> [<c01048d3>] common_interrupt+0x23/0x30 >> [<c0108743>] default_idle+0x25/0x38 >> [<c0102926>] cpu_idle+0x41/0x5b >> Code: 0f 84 01 01 00 00 9c 8f 44 24 1c fa 8b 53 10 31 ff 89 6c 24 18 >> 89 14 24 31 d2 eb 3f 8b 4c 24 10 83 c1 38 89 4c 24 14 8b 4c 24 10 <8b> >> 41 38 29 e8 85 d2 75 0d 39 f0 72 09 8b 51 04 29 f0 89 6c 24 >> EIP: [<d08260fa>] p54_assign_address+0x67/0x14b [p54common] SS:ESP 0068:c03f9c4c >> Kernel panic - not syncing: Fatal exception in interrupt >> > wt*, this bug is "impossible": > > The bug happens when p54_assign_address looks for a free space for a new frame: > here's the code: > [...] > if (!skb) > return -EINVAL; <--- we don't accept "null" skbs > > spin_lock_irqsave(&priv->tx_queue.lock, flags); <--- we are under a spin_lock with irq disabled > left = skb_queue_len(&priv->tx_queue); > while (left--) { > u32 hole_size; > info = IEEE80211_SKB_CB(entry); <--- Here it BUGs, > [...] > > your binary module said that skb->cb is at 0x38, > so our "entry" is really NULL right when it BUGS. > And this only happens means that the queue was > modified "outside" of our driver. > > Since we always take the spin_lock_irqsave (of course, > only of "our" tx_queue). if we need to do anything with the data in the queue, > > Of course, since the package as queued while the station was sleeping > somewhere mac80211, so maybe it still holds a reference to, but then > other drivers would have already spotted this misbehaviour long time ago... > > So? back to square one... I guess. > > Regards, > Chr > -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html