Search Linux Wireless

Re: ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Okay, I've made much more progress on this old thread.  I haven't actually
fixed the bug, which I suspect is a race condition only on multicore
machines, but I at least have better reproduction steps and a workaround.

The bug seems to trigger when three things happen at once:
1) Background interference causes retries
2) AP wants to send data to the STA, which has been idle for a while
3) We want to negotiate a new BA session from AP to STA.

Sometimes, the background interference will cause the time between ADDBA
Request (from AP) and ADDBA Response (from STA) to be longer than usual.  In
my tests, it's usually <1ms, but in high-interference situations I've seen
it be >3ms.  Sometimes, when the delay is longer, I see the symptom that the
agg_status file for the station in question starts showing TID#0's "pending"
column increasing slowly, until it eventually reaches 64.  A wifi capture on
a separate sniffer indicates that no data is being transmitted to that
station, although traffic to other stations (and broadcast/multicast)
continues unabated.  I guess this means the device's queues are themselves
not stopped, but the station's per-TID aggregation queue is stuck.

Twiddling the agg_status of a different queue (in this case TID#1) unblocks
TID#0:
echo "tx start 1" >/sys/kernel/debug/ieee80211/phy0/.../agg_status

So does having another aggregation-capable device join the network.  Having
an 802.11g-only device join the network does *not* unblock the queue.

However, trying to stop TID#0 doesn't help (and it also doesn't successfully
stop the aggregation):
echo "tx stop 0" >/sys/kernel/debug/ieee80211/phy0/.../agg_status

The following patch makes the problem easier to reproduce by letting you
turn the aggregation timeout way down.  For myself, I used a
default_agg_timeout of 500ms and just pinged repeatedly once per second from
the AP to STA.  This causes the aggregation sessions to be repeatedly
brought up and torn down, which triggers the problem for me within a few
minutes (when run on a channel with fairly high noise).

Changing default_agg_timeout to zero (as it is on most non-ath9k drivers)
makes the problem pretty much go away.  However, I think it's because I'm
just dodging the code path that triggers a race condition.

Notes:

- I'm using exactly the same ath9k driver (currently 20150525, but we've
  tried newer ones with no difference) on two totally different platforms: a
  dual-core mindspeed c2k host CPU (ARMv7) with separate ath9k, and a
  single-core QCA9531 (MIPS) with on-chip ath9k.

- I've been unable to trigger the problem on the QCA9531, but I have on
  MIPS.

The aggregation code is... a little hairy.  Does anyone have any guesses
where I might look for the race condition?  Or better still, a patch I can
try?


Avery Pennarun (1):
  mac80211: add a debugfs var for the default aggregation timeout.

 net/mac80211/debugfs_netdev.c      | 4 ++++
 net/mac80211/rc80211_minstrel_ht.c | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

-- 
2.7.0.rc3.207.g0ac5344

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux