On 11/19/2010 02:27 PM, Luis R. Rodriguez wrote:
On Fri, Nov 19, 2010 at 12:55 PM, Ben Greear<greearb@xxxxxxxxxxxxxxx> wrote:
On 11/19/2010 09:57 AM, Johannes Berg wrote:
On Fri, 2010-11-19 at 15:34 +0100, Tejun Heo wrote:
Awesome. :-)
Ben, if you have trouble generating full trace, please let me know if
there's something I can buy which isn't too expensive to reproduce the
problem. I would be happy to track it down myself.
Maybe you can try Ben's setup in kvm (or directly on your box if you
like) with mac80211_hwsim. From a mac80211 POV it should be almost
equivalent, although it'll do different memory allocation patterns etc.
I tried manually backing out my patch, and now I can no longer reproduce
the problem. Maybe something in -rc2 fixed it, or maybe some changes
to my environment just made it harder to hit.
If you see no logical reason why calling flush_work with RTNL held
would cause trouble, then I guess we can just leave the code as is
for now.
If you do want to play with this yourself, I think any ath5k type adapter
with 64+ virtual stations configured would be a valid test case. My
application calls ifdown/ifup on them a few times after being created
and then generates traffic (and gathers stats, calls 'iwconfig', etc).
As configured in the original scenario that reproduced the problem,
the STAs had no encryption and were all associating with a single AP.
wpa_supplicant was not being used.
FWIW, I had to do similar tests before and Ben offered up a perl
script to do something similar to what his proprietary app does upon
device bring up. I've modified it just a bit and you can find it here:
http://www.kernel.org/pub/linux/kernel/people/mcgrof/scripts/poo.pl
Well, I backed out my work-around patch yesterday, and then let
the system run overnight. This morning it is mostly dead, spewing
OOM errors and with a bunch of 'sh' processes using maximum amount
of CPU, blocked on trying to acquire rtnl.
There is one 'ip' process that appears to hold rtnl and is trying
to call ieee80211_do_stop, which is probably blocked down in
the work-queue logic just like last time. Lots of worker processes
attempting to grab rtnl (and many other processes as well.)
Lockdep was disabled because a proprietary module of mine was attempted
to be loaded, but it doesn't actual load due to symbol mismatch
(it's compiled against a non-debug kernel).
If the lockdep info is critical, I can attempt to reproduce with
my module completed removed from the file system so it cannot attempt
to load, but it seems like last time the 'sysrq t' was of more interest
anyway.
I have uploaded what I believe is a full 'sysrq t' output, interspersed
with OOM warnings that are constantly spewing to the console,
here:
http://www.candelatech.com/~greearb/minicom_ath9k_log.txt
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html