On Mon, Oct 4, 2010 at 8:39 PM, Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote: > On 10/04/2010 04:48 PM, Luis R. Rodriguez wrote: >> >> On Mon, Oct 4, 2010 at 2:38 PM, Ben Greear<greearb@xxxxxxxxxxxxxxx> >> Âwrote: >>> >>> On 10/04/2010 02:13 PM, Luis R. Rodriguez wrote: >>>> >>>> On Mon, Oct 4, 2010 at 2:12 PM, Luis R. Rodriguez<mcgrof@xxxxxxxxx> >>>> Âwrote: >>>>> >>>>> On Mon, Oct 4, 2010 at 12:10 PM, Johannes Berg >>>>> <johannes@xxxxxxxxxxxxxxxx> Â Âwrote: >>>>>> >>>>>> On Mon, 2010-10-04 at 12:04 -0700, Ben Greear wrote: >>>>>>> >>>>>>> On 10/04/2010 12:01 PM, Johannes Berg wrote: >>>>>>>> >>>>>>>> On Mon, 2010-10-04 at 11:51 -0700, Ben Greear wrote: >>>>>>>>> >>>>>>>>> Just in case this seems familiar to anyone... >>>>>>>>> >>>>>>>>> IP: [<f8ba74da>] ieee80211_stop_tx_ba_session+0x14/0x84 [mac80211] >>>>>>>> >>>>>>>> Do you have debug info that'd point to a code line? >>>>>>>> >>>>>>>> I have never heard of this. >>>>>>> >>>>>>> I don't actually know how to get a line of code out of those >>>>>>> hex offsets... >>>>>>> >>>>>>> Someone told me many years ago..but I lost that information :P >>>>>> >>>>>> Err, I never remember either, I think Luis knows the gdb thing ... I >>>>>> usually use "objdump -dS" >>>>> >>>>> gdb net/mac80211/mac80211.ko >>>>> l *(ieee80211_stop_tx_ba_session+0x14/0x84) >>>> >>>> Oops I meant: >>>> >>>> gdb net/mac80211/mac80211.ko >>>> l *(ieee80211_stop_tx_ba_session+0x14) >>> >>> Thank! >>> >>> I had to re-compile with debugging symbols, and added kgdb (hopefully >>> that won't mess anything up). >> >> You may want to look at using netconsole instead if you're goal is >> just to get some oops off the box. >> >> CONFIG_NETCONSOLE=m >> >> mcgrof@tux ~/bin $ cat netconsole >> #!/bin/bash >> sudo dmesg -n 8 >> sudo ip addr add 192.168.4.2/24 dev eth4 >> sudo modprobe netconsole >> netconsole="@192.168.4.2/eth4,@192.168.4.3/00:1e:37:82:48:5a" >> >> I'd run that script on the dev box, and on 192.168.4.3 just do `nc -l >> -p 6666 | tee log`. To test just modprobe and rmmod ath9k. >> >>> Reading symbols from >>> >>> /home/greearb/kernel/2.6/wireless-testing-dbg.p4s/net/mac80211/mac80211.ko...done. >>> (gdb) l *(ieee80211_stop_tx_ba_session+0x14) >>> 0x54fe is in ieee80211_stop_tx_ba_session >>> (/home/greearb/git/linux.wireless-testing/net/mac80211/agg-tx.c:595). >>> 590 >>> 591 Â Â int ieee80211_stop_tx_ba_session(struct ieee80211_sta *pubsta, >>> u16 >>> tid) >>> 592 Â Â { >>> 593 Â Â Â Â Â Â struct sta_info *sta = container_of(pubsta, struct >>> sta_info, >>> sta); >>> 594 Â Â Â Â Â Â struct ieee80211_sub_if_data *sdata = sta->sdata; >>> 595 Â Â Â Â Â Â struct ieee80211_local *local = sdata->local; >> >> What was the oops complaint? NULL pointer dereference? If sdata got >> screwed up that would be pretty serious, the only way that could >> happen is if somehow it managed to get removed prior to the >> ieee80211_stop_tx_ba_session() or if there is some sort of memory >> corruption., What steps do you follow to reproduce? > > It's dying trying to de-reference something, probably sdata, but for some > reason I didn't > think it was NULL. Â(I was having trouble getting clean stack dumps > on the serial console on top of my other issues today.) ÂIn A > probably-similar > crash it was trying to dereference 0x00100104 (See my 3:42 email) > in this series. > > I added printks to the stop_tx_ba_session method to try to figure out what > was happening, but > of course then I could no longer reproduce it, or at least it crashed in the > cfg80211_unlink_bss > first. > > To reproduce, I have a user-space app that creates 130 or so STA devices, > starts wpa_supplicant > for each one, and then watches events with 'iw event', and reads > /proc/net/wireless quite often > (and grabs some other stats out of debugfs, etc). ÂIt runs 'iwconfig' and > parses output for > other stats. ÂIn short, it does a bunch of things that would be hard to > reproduce with any > simple script. ÂThe user-space app is proprietary, though I would of course > give you a free > binary and help you set it up should you wish to use it. > > When I disabled power-save, it ran a lot longer, but it would still > hard-hang or occasionally > crash with stack-trace pointing to the 0x00100104 dereference. > > Perhaps related, with power-save disabled, after a while (maybe 10-20 > minutes), the system > would often get to a state where the ath9k no longer showed any additional > transmitted packets > in it's debugfs traffic. ÂThe netdevices (sta1, etc), would show tx pkt > counters increasing, > and the qdiscs showed no backlog. ÂIt was getting rx interrupts, but no tx, > according to > debugfs output. ÂI didn't get any chance to debug that any further. > > We have much better luck with ath5k in general, so I think most of these > issues are > related to ath9k and/or /n in general. ÂBut, even so, we do see deadlocks > (on rtnl_lock, it seems) > with ath5k, and I still have some lockdep warnings to deal with in the > mac80211 code, > so it's possible the problem is more general and ath9k just triggers it much > easier. Can you try with mac80211_hwsim ? Luis -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html