Re: p54: AP mode: no data frame despite traffic indication set in TIM

"Stefan Steuerwald" <salsasepp@xxxxxxxxxxxxxx> · Thu, 27 Nov 2008 09:57:11 +0100

Seems solved! Maybe a little premature, but I need to blurt this out...

Don't know why, but I think I was not thorough enough in my kernel config:

- I have a so-called alix board featuring a Geode LX 800
- I had tried to set processor type to Geode GX/LX, but that did not
boot (hangs somewhere)
- I didn't bother to find out why, but compiled for 486 instead (worked)
- In the meantime, I copied/merged a kernel .config from another
branch with processor type = 586/686/etc, which went unnoticed by me,
but seemed to work all the time (except maybe for that last kernel
BUG)
- Now I compiled for Geode GX/LX again, and set
CONFIG_GEODE_MFGPT_TIMER=n (as per this info here:
https://kerneltrap.org/mailarchive/linux-kernel/2008/1/20/585236)
which makes my kernel boot and SEEMS TO MAKE THAT CRASH GO AWAY!!!

At least I haven't observed the crash in the last 60 minutes, whereas
before it took only 1-2 minutes every time to turn it up.
Will test this all day.

The three patches mentioned before are applied, and my app-level
timeout is still gone, and the "dropped filtered TX" messages are gone
as well.

Christian, should I actually test your p54-sta-flags-v3 patch?

Regards,
  Stefan.

2008/11/26 Christian Lamparter <chunkeey@xxxxxx>:
> On Wednesday 26 November 2008 14:38:59 Stefan Steuerwald wrote:
>> console [netcon0] enabled
>> netconsole: network logging started
>> BUG: unable to handle kernel NULL pointer dereference at 00000038
>> IP: [<d08260fa>] p54_assign_address+0x67/0x14b [p54common]
>> *pde = 00000000
>> Oops: 0000 [#1]
>> last sysfs file: /sys/class/net/lo/operstate
>> Modules linked in: netconsole ipv6 loop evdev ehci_hcd ohci_hcd
>> rtc_cmos rtc_core pcspkr rtc_lib p54pci usbcore via_rhine p54common
>> geode_aes mii [last unloaded: netconsole]
>>
>> Pid: 0, comm: swapper Not tainted (2.6.28-rc6-wl #16)
>> EIP: 0060:[<d08260fa>] EFLAGS: 00010002 CPU: 0
>> EIP is at p54_assign_address+0x67/0x14b [p54common]
>> EAX: cf98b178 EBX: cf86ee40 ECX: 00000000 EDX: 00000000
>> ESI: 000000f8 EDI: 00000000 EBP: 0002027c ESP: c03f9c4c
>>  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
>> Process swapper (pid: 0, ti=c03f8000 task=c03c4380 task.ti=c03f8000)
>> Stack:
>>  00000002 ce4d5880 ce4c48b4 cf86e1a0 00000000 00000038 00020200 00000286
>>  cf86ee40 00000004 ce4d58b2 ce4d588c d0826fd7 00000090 014c48d4 ce4c48b4
>>  cf86e1a0 0086ee40 00000004 02000282 ce4c48d4 cf86ef10 cf86ee40 ce4d5880
>> Call Trace:
>>  [<d0826fd7>] p54_tx+0x416/0x482 [p54common]
>>  [<c02fb7c2>] __ieee80211_tx+0x35/0xf8
>>  [<c02fc235>] ieee80211_master_start_xmit+0x2ab/0x396
>>  [<c01048d3>] common_interrupt+0x23/0x30
>>  [<c0297368>] dev_hard_start_xmit+0x16e/0x1c9
>>  [<c02a3518>] __qdisc_run+0xa2/0x15c
>>  [<c0297796>] dev_queue_xmit+0x2f5/0x3c5
>>  [<c02f8608>] ieee80211_invoke_rx_handlers+0x488/0x1486
>>  [<c02d9d14>] bictcp_cong_avoid+0x10/0x160
>>  [<c02bd904>] tcp_ack+0x16f0/0x1850
>>  [<c01170f0>] enqueue_task_fair+0x12a/0x16b
>>  [<c02c0c37>] tcp_current_mss+0x6b/0xe4
>>  [<c02f9b50>] __ieee80211_rx_handle_packet+0x54a/0x56d
>>  [<c02fa1fe>] __ieee80211_rx+0x491/0x4e3
>>  [<c02ec95d>] ieee80211_tasklet_handler+0x60/0xd6
>>  [<c011cfae>] tasklet_action+0x3e/0x64
>>  [<c011d305>] __do_softirq+0x4a/0xbc
>>  [<c011d399>] do_softirq+0x22/0x26
>>  [<c011d44f>] irq_exit+0x25/0x55
>>  [<c0105996>] do_IRQ+0x5a/0x6c
>>  [<c01048d3>] common_interrupt+0x23/0x30
>>  [<c0108743>] default_idle+0x25/0x38
>>  [<c0102926>] cpu_idle+0x41/0x5b
>> Code: 0f 84 01 01 00 00 9c 8f 44 24 1c fa 8b 53 10 31 ff 89 6c 24 18
>> 89 14 24 31 d2 eb 3f 8b 4c 24 10 83 c1 38 89 4c 24 14 8b 4c 24 10 <8b>
>> 41 38 29 e8 85 d2 75 0d 39 f0 72 09 8b 51 04 29 f0 89 6c 24
>> EIP: [<d08260fa>] p54_assign_address+0x67/0x14b [p54common] SS:ESP 0068:c03f9c4c
>> Kernel panic - not syncing: Fatal exception in interrupt
>>
> wt*, this bug is "impossible":
>
> The bug happens when p54_assign_address looks for a free space for a new frame:
> here's the code:
> [...]
> if (!skb)
>        return -EINVAL;   <--- we don't accept "null" skbs
>
> spin_lock_irqsave(&priv->tx_queue.lock, flags); <--- we are under a spin_lock with irq disabled
> left = skb_queue_len(&priv->tx_queue);
> while (left--) {
>                u32 hole_size;
>                info = IEEE80211_SKB_CB(entry);  <--- Here it BUGs,
> [...]
>
> your binary module said that skb->cb is at 0x38,
> so our "entry" is really NULL right when it BUGS.
> And this only happens means that the queue was
> modified "outside" of our driver.
>
> Since we always take the spin_lock_irqsave (of course,
> only of "our" tx_queue). if we need to do anything with the data in the queue,
>
> Of course, since the package as queued while the station was sleeping
> somewhere mac80211, so maybe it still holds a reference to, but then
> other drivers would have already spotted this misbehaviour long time ago...
>
> So? back to square one... I guess.
>
> Regards,
>        Chr
>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html