On Wed, Apr 6, 2016 at 3:40 AM, Johannes Berg <johannes@xxxxxxxxxxxxxxxx> wrote: > On Tue, 2016-04-05 at 19:46 -0400, Avery Pennarun wrote: > >> This test was with backports-20150525 on ath9k. (We have newer >> versions in the queue, but they haven't rolled out to our customers >> yet. Anyway, earlier in this thread, I was able to trigger the race >> condition on much newer backports. Unfortunately the current fix >> makes my reproducible test case go away, but I don't know any reason >> to assume the race condition is fixed.) > > Well, we know that the timeout is likely unrelated to the issue (other > than not triggering the broken code path that frequently), so you can > revert the timeout change for the test case. Yes. And I can make it happen more often by making it timeout the aggregation agreement much more frequently than usual. >> While we're here, unfortunately it turns out that just observing the >> agg_status file can cause crashes (though not very often... except >> for a few unlucky customers), probably due to a different race >> condition. >> Any suggestions about this one? Stack trace attached below. (I >> think the stack trace suggests a mac80211 problem?) > > That has to be a mac80211 problem, yeah. > (Side note: I'm a bit surprised this is a 32-bit system?) We're going for all of good, fast, and cheap here. That should end well :) > Looks like we use RCU protection to get the data. Can I get the > mac80211.ko binary (with debug data) corresponding to the crash below? Yes. Here it is: http://apenwarr.ca/tmp/mac80211-agg-status-crash.ko Thanks for your help! -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html