Search Linux Wireless

Re: Crash in agg-tx.c, with ath9k and lots of STA VIFs.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 4, 2010 at 8:39 PM, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
> On 10/04/2010 04:48 PM, Luis R. Rodriguez wrote:
>>
>> On Mon, Oct 4, 2010 at 2:38 PM, Ben Greear<greearb@xxxxxxxxxxxxxxx>
>> Âwrote:
>>>
>>> On 10/04/2010 02:13 PM, Luis R. Rodriguez wrote:
>>>>
>>>> On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez<mcgrof@xxxxxxxxx>
>>>> Âwrote:
>>>>>
>>>>> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg
>>>>> <johannes@xxxxxxxxxxxxxxxx> Â Âwrote:
>>>>>>
>>>>>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote:
>>>>>>>
>>>>>>> On 10/04/2010 12:01 PM, Johannes Berg wrote:
>>>>>>>>
>>>>>>>> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote:
>>>>>>>>>
>>>>>>>>> Just in case this seems familiar to anyone...
>>>>>>>>>
>>>>>>>>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211]
>>>>>>>>
>>>>>>>> Do you have debug info that'd point to a code line?
>>>>>>>>
>>>>>>>> I have never heard of this.
>>>>>>>
>>>>>>> I don't actually know how to get a line of code out of those
>>>>>>> hex offsets...
>>>>>>>
>>>>>>> Someone told me many years ago..but I lost that information :P
>>>>>>
>>>>>> Err, I never remember either, I think Luis knows the gdb thing ... I
>>>>>> usually use "objdump -dS"
>>>>>
>>>>> gdb net/mac80211/mac80211.ko
>>>>> l *(ieee80211_stop_tx_ba_session+0x14/0x84)
>>>>
>>>> Oops I meant:
>>>>
>>>> gdb net/mac80211/mac80211.ko
>>>> l *(ieee80211_stop_tx_ba_session+0x14)
>>>
>>> Thank!
>>>
>>> I had to re-compile with debugging symbols, and added kgdb (hopefully
>>> that won't mess anything up).
>>
>> You may want to look at using netconsole instead if you're goal is
>> just to get some oops off the box.
>>
>> CONFIG_NETCONSOLE=m
>>
>> mcgrof@tux ~/bin $ cat netconsole
>> #!/bin/bash
>> sudo dmesg -n 8
>> sudo ip addr add 192.168.4.2/24 dev eth4
>> sudo modprobe netconsole
>> netconsole="@192.168.4.2/eth4,@192.168.4.3/00:1e:37:82:48:5a"
>>
>> I'd run that script on the dev box, and on 192.168.4.3 just do `nc -l
>> -p 6666 | tee log`. To test just modprobe and rmmod ath9k.
>>
>>> Reading symbols from
>>>
>>> /home/greearb/kernel/2.6/wireless-testing-dbg.p4s/net/mac80211/mac80211.ko...done.
>>> (gdb) l *(ieee80211_stop_tx_ba_session+0x14)
>>> 0x54fe is in ieee80211_stop_tx_ba_session
>>> (/home/greearb/git/linux.wireless-testing/net/mac80211/agg-tx.c:595).
>>> 590
>>> 591 Â Â int ieee80211_stop_tx_ba_session(struct ieee80211_sta *pubsta,
>>> u16
>>> tid)
>>> 592 Â Â {
>>> 593 Â Â Â Â Â Â struct sta_info *sta = container_of(pubsta, struct
>>> sta_info,
>>> sta);
>>> 594 Â Â Â Â Â Â struct ieee80211_sub_if_data *sdata = sta->sdata;
>>> 595 Â Â Â Â Â Â struct ieee80211_local *local = sdata->local;
>>
>> What was the oops complaint? NULL pointer dereference? If sdata got
>> screwed up that would be pretty serious, the only way that could
>> happen is if somehow it managed to get removed prior to the
>> ieee80211_stop_tx_ba_session() or if there is some sort of memory
>> corruption., What steps do you follow to reproduce?
>
> It's dying trying to de-reference something, probably sdata, but for some
> reason I didn't
> think it was NULL. Â(I was having trouble getting clean stack dumps
> on the serial console on top of my other issues today.) ÂIn A
> probably-similar
> crash it was trying to dereference 0x00100104 (See my 3:42 email)
> in this series.
>
> I added printks to the stop_tx_ba_session method to try to figure out what
> was happening, but
> of course then I could no longer reproduce it, or at least it crashed in the
> cfg80211_unlink_bss
> first.
>
> To reproduce, I have a user-space app that creates 130 or so STA devices,
> starts wpa_supplicant
> for each one, and then watches events with 'iw event', and reads
> /proc/net/wireless quite often
> (and grabs some other stats out of debugfs, etc). ÂIt runs 'iwconfig' and
> parses output for
> other stats. ÂIn short, it does a bunch of things that would be hard to
> reproduce with any
> simple script. ÂThe user-space app is proprietary, though I would of course
> give you a free
> binary and help you set it up should you wish to use it.
>
> When I disabled power-save, it ran a lot longer, but it would still
> hard-hang or occasionally
> crash with stack-trace pointing to the 0x00100104 dereference.
>
> Perhaps related, with power-save disabled, after a while (maybe 10-20
> minutes), the system
> would often get to a state where the ath9k no longer showed any additional
> transmitted packets
> in it's debugfs traffic. ÂThe netdevices (sta1, etc), would show tx pkt
> counters increasing,
> and the qdiscs showed no backlog. ÂIt was getting rx interrupts, but no tx,
> according to
> debugfs output. ÂI didn't get any chance to debug that any further.
>
> We have much better luck with ath5k in general, so I think most of these
> issues are
> related to ath9k and/or /n in general. ÂBut, even so, we do see deadlocks
> (on rtnl_lock, it seems)
> with ath5k, and I still have some lockdep warnings to deal with in the
> mac80211 code,
> so it's possible the problem is more general and ath9k just triggers it much
> easier.

Can you try with mac80211_hwsim ?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux