Re: kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues?

Ben Greear <greearb@xxxxxxxxxxxxxxx> · Wed, 12 Jun 2013 13:58:41 -0700

On 06/12/2013 01:46 PM, Johannes Berg wrote:
On Wed, 2013-06-12 at 11:21 -0700, Ben Greear wrote:

In ieee80211_start_tx_ba_session we are accessing and assigning the tid_start_tx
without holding the ampdu_mlme.mtx mutex.

	spin_lock_bh(&sta->lock);
.....
	tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
	/* check if the TID is not in aggregation flow already */
	if (tid_tx || sta->ampdu_mlme.tid_start_tx[tid]) {

....

	/*
	 * Finally, assign it to the start array; the work item will
	 * collect it and move it to the normal array.
	 */
	sta->ampdu_mlme.tid_start_tx[tid] = tid_tx;


Elsewhere, in ieee80211_ba_session_work, we access the tid_start_tx
without the sta->lock held, but with the ampdu_mlme.mtx held.

Yeah, that seems wrong.

I think we should probably hold ampdu_mlme.mtx in ieee80211_start_tx_ba_session
or make sure we hold sta->lock in ieee80211_ba_session_work.

Can't hold the mutex there, but we can do the lock (I'll comment on your
patch separately)

unreferenced object 0xffff880219b4de40 (size 192):
    comm "softirq", pid 0, jiffies 4296416789 (age 1257.971s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<ffffffff815bc02c>] kmemleak_alloc+0x73/0x98
      [<ffffffff8117d4b4>] slab_post_alloc_hook+0x28/0x2a
      [<ffffffff8117f4a6>] kmem_cache_alloc_trace+0xa5/0xcc
      [<ffffffffa0365221>] ieee80211_start_tx_ba_session+0x24b/0x360 [mac80211]
      [<ffffffffa03a98f3>] minstrel_ht_tx_status+0x79a/0x7a9 [mac80211]
      [<ffffffffa035d1cd>] ieee80211_tx_status+0x3af/0x947 [mac80211]

When did this report get printed?

I have a system with 100 or so stations constantly trying to
associate with a set of APs that can handle < 100.  This
effectively causes constant churn of re-associations and
associated logic...

Good for shaking out bugs it seems :)

These and other leaks show up after a few minutes of
running this test scenario.  It's not a huge number of
leaks, however...so usually stations go away w/out leaking.

I have a feeling what happens is that start is requested, and then
before ieee80211_ba_session_work() gets a chance to run the station is
destroyed.

Should probably have something like this:

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index b429798..aaf68d2 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -149,6 +149,7 @@ static void cleanup_single_sta(struct sta_info *sta)
  	 * directly by station destruction.
  	 */
  	for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+		kfree(sta->ampdu_mlme.tid_start_tx[i]);
  		tid_tx = rcu_dereference_raw(sta->ampdu_mlme.tid_tx[i]);
  		if (!tid_tx)
  			continue;

Looks reasonable to me.  I was about to start testing similar logic
in sta_info_free(), but likely your patch is more proper.

I'll give it a try now.

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html