On 01/29/2018 02:35 PM, Toke Høiland-Jørgensen wrote:
Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:
On 01/29/2018 01:47 PM, Toke Høiland-Jørgensen wrote:
Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:
On 01/27/2018 05:11 AM, Toke Høiland-Jørgensen wrote:
Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:
I'm doing a test with 200 virtual stations on each of 6 ath9k radios.
When I configure stations for DHCP, I see cases where stations on a particular
radio will not transmit anything sometimes. I see no 'XMIT' logs that show indication of
frames being received in the driver from the upper stack, but if I use 'tshark' on
a station interface, it shows frames being 'transmitted'.
I do, however, see this, which looks like it might show
an issue. It looks like whatever 'aqm' is, it has an ever expanding number
of backlog packets:
The aqm is the intermediate queues in mac80211. So this indicates that
the driver is not pulling packets for transmission.
With that many stations, I wonder whether it is due to the airtime
fairness scheduler throttling the station? What is the contents of
debug/ieee80211/wiphy2/netdev\:sta30194/stations/00\:0e\:8e\:69\:b8\:f7/airtime
while the station is not transmitting? And is it all stations on that
particular radio, or only some of them?
Here is the output of airtime and aqm on a hung station:
# cat /debug/ieee80211/wiphy0/netdev\:sta10057/stations/00\:0e\:8e\:50\:74\:8a/airtime
RX: 83706 us
TX: 4202 us
Deficit: VO: 198 us VI: 300 us BE: -8306 us BK: 300 us
Right. This looks like incoming traffic is depleting the airtime quantum
faster than it can be replenished by the scheduler, which means that the
station gets completely starved.
Could you try turning off the airtime scheduler?
echo 0 > /sys/kernel/debug/ieee80211/wiphy0/ath9k/airtime_flags
and see if the problem goes away.
If it does, please check if the problem persists when setting
airtime_flags to 1 (which means only include TX airtime).
-Toke
That did not seem to help:
# cat /debug/ieee80211/wiphy0/netdev\:sta10058/stations/00\:0e\:8e\:50\:74\:8a/node_aggr
Max-AMPDU: 65535
MPDU Density: 8
TID SEQ_START SEQ_NEXT BAW_SIZE BAW_HEAD BAW_TAIL BAR_IDX SCHED HAS-QUED
0 0 0 64 0 0 -1 1 1
Hmm, SCHED and HAS-QUED are both set, so it should be scheduled. Is the
scheduler maybe simply taking too long to get round to scheduling that
station again?
What happens if you don't kill things after 30 seconds? Is it hanging
forever, or just long enough for your tools to lose patience?
If you have 200 stations all requesting DHCP addresses I could see how
things might take a while...
I bring them up in groups of 30 or so. I typically see 1-10 of them get
DHCP address, and then it seems that no data frames ever are tx'd again on
any interface on the radio...or at least tx is very rare. Sometimes, all 200 will come
up and pass traffic, but not reliably. Once the system gets in this state,
down/up of the affected station interfaces does not fix it. I have not tried
bouncing all of them at once yet.
I never even see dhcp discovers on the air when sniffing on another machine,
from any interface once it is hung, so it should not be a simple over-busy
network issue.
Maybe there is some way for the scheduler to get stuck and not schedule anything?
Thanks,
Ben
-Toke
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com