On 06/12/2013 01:46 PM, Johannes Berg wrote:
On Wed, 2013-06-12 at 11:21 -0700, Ben Greear wrote:
In ieee80211_start_tx_ba_session we are accessing and assigning the tid_start_tx
without holding the ampdu_mlme.mtx mutex.
spin_lock_bh(&sta->lock);
.....
tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
/* check if the TID is not in aggregation flow already */
if (tid_tx || sta->ampdu_mlme.tid_start_tx[tid]) {
....
/*
* Finally, assign it to the start array; the work item will
* collect it and move it to the normal array.
*/
sta->ampdu_mlme.tid_start_tx[tid] = tid_tx;
Elsewhere, in ieee80211_ba_session_work, we access the tid_start_tx
without the sta->lock held, but with the ampdu_mlme.mtx held.
Yeah, that seems wrong.
I think we should probably hold ampdu_mlme.mtx in ieee80211_start_tx_ba_session
or make sure we hold sta->lock in ieee80211_ba_session_work.
Can't hold the mutex there, but we can do the lock (I'll comment on your
patch separately)
unreferenced object 0xffff880219b4de40 (size 192):
comm "softirq", pid 0, jiffies 4296416789 (age 1257.971s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff815bc02c>] kmemleak_alloc+0x73/0x98
[<ffffffff8117d4b4>] slab_post_alloc_hook+0x28/0x2a
[<ffffffff8117f4a6>] kmem_cache_alloc_trace+0xa5/0xcc
[<ffffffffa0365221>] ieee80211_start_tx_ba_session+0x24b/0x360 [mac80211]
[<ffffffffa03a98f3>] minstrel_ht_tx_status+0x79a/0x7a9 [mac80211]
[<ffffffffa035d1cd>] ieee80211_tx_status+0x3af/0x947 [mac80211]
When did this report get printed?
I have a system with 100 or so stations constantly trying to
associate with a set of APs that can handle < 100. This
effectively causes constant churn of re-associations and
associated logic...
Good for shaking out bugs it seems :)
These and other leaks show up after a few minutes of
running this test scenario. It's not a huge number of
leaks, however...so usually stations go away w/out leaking.
I have a feeling what happens is that start is requested, and then
before ieee80211_ba_session_work() gets a chance to run the station is
destroyed.
Should probably have something like this:
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index b429798..aaf68d2 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -149,6 +149,7 @@ static void cleanup_single_sta(struct sta_info *sta)
* directly by station destruction.
*/
for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ kfree(sta->ampdu_mlme.tid_start_tx[i]);
tid_tx = rcu_dereference_raw(sta->ampdu_mlme.tid_tx[i]);
if (!tid_tx)
continue;
Looks reasonable to me. I was about to start testing similar logic
in sta_info_free(), but likely your patch is more proper.
I'll give it a try now.
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html