On 12.07.24 04:23, Cedric Veilleux wrote:
AP mode.
Both 2.4 and 5ghz channels.
Using WLE600VX (QCA986x/988x), we are seeing the following errors in
kernel logs:
[12978.022077] ath10k_pci 0000:04:00.0: failed to flush transmit queue
(skip 0 ar-state 1): 0
[13343.069189] ath10k_pci 0000:04:00.0: failed to flush transmit queue
(skip 0 ar-state 1): 0
They are somewhat random but frequent. Can happen once a day or many
times per hour.
They are associated with 3-4 seconds of radio silence. Full packet
loss. Then everything resumes normally, STA are still associated and
traffic resumes.
I have tested with major kernel versions:
6.1.97: stable (tested for many days on 10+ access points)
6.2.16: stable (tested for few hours single machine)
6.3.13: stable (tested for few hours single machine)
6.4.16: unstable (we have errors within an hour)
6.5.13: unstable (we have errors within an hour)
6.6.39: unstable (we have errors within an hour)
6.7.12: unstable (we have errors within an hour)
6.8.10: unstable (we have errors within an hour)
6.9.7: unstable (we have errors within an hour)
From these tests I believe something changed in 6.4 series causing
instabilities and the dreaded "failed to flush transmit queue" error.
This is a custom linux distribution. Only change is the kernel. All
other packages are same versions. Everything rebuilt from source using
bitbake/yocto. Same linux-firmware files.
I'm pretty sure it's caused by this commit:
commit 0b75a1b1e42e07ae84e3a11d2368b418546e2bec
Author: Johannes Berg <johannes.berg@xxxxxxxxx>
Date: Fri Mar 31 16:59:16 2023 +0200
wifi: mac80211: flush queues on STA removal
I guess somebody needs to look into making the queue flush on ath10k
more reliable (or even better, implement a more lightweight .flush_sta op).
I don't have time to do the work myself, but hopefully this information
could help somebody else take care of it.
- Felix