Search Linux Wireless

Re: [REGRESSION] ath10k: failed to flush transmit queue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

On 7/31/24 11:13 AM, Kalle Valo wrote:
Felix Fietkau <nbd@xxxxxxxx> writes:

On 12.07.24 04:23, Cedric Veilleux wrote:

AP mode.
Both 2.4 and 5ghz channels.
Using WLE600VX (QCA986x/988x), we are seeing the following errors in
kernel logs:
[12978.022077] ath10k_pci 0000:04:00.0: failed to flush transmit
queue
(skip 0 ar-state 1): 0
[13343.069189] ath10k_pci 0000:04:00.0: failed to flush transmit queue
(skip 0 ar-state 1): 0
They are somewhat random but frequent. Can happen once a day or many
times per hour.
They are associated with 3-4 seconds of radio silence. Full packet
loss. Then everything resumes normally, STA are still associated and
traffic resumes.
I have tested with major kernel versions:
6.1.97: stable (tested for many days on 10+ access points)
6.2.16: stable (tested for few hours single machine)
6.3.13: stable (tested for few hours single machine)
6.4.16: unstable  (we have errors within an hour)
6.5.13: unstable  (we have errors within an hour)
6.6.39: unstable  (we have errors within an hour)
6.7.12: unstable  (we have errors within an hour)
6.8.10: unstable  (we have errors within an hour)
6.9.7: unstable  (we have errors within an hour)
  From these tests I believe something changed in 6.4 series causing
instabilities and the dreaded "failed to flush transmit queue" error.
This is a custom linux distribution. Only change is the kernel. All
other packages are same versions. Everything rebuilt from source using
bitbake/yocto. Same linux-firmware files.
I'm pretty sure it's caused by this commit:

commit 0b75a1b1e42e07ae84e3a11d2368b418546e2bec
Author: Johannes Berg <johannes.berg@xxxxxxxxx>
Date:   Fri Mar 31 16:59:16 2023 +0200

     wifi: mac80211: flush queues on STA removal

I guess somebody needs to look into making the queue flush on ath10k
more reliable (or even better, implement a more lightweight .flush_sta
op).

I don't have time to do the work myself, but hopefully this
information could help somebody else take care of it.
Adding ath10k list so that everyone see this.

I want to revive this thread and provide some additional data. This is not just something that happens in AP mode, or specifically with the hardware mentioned. After upgrading from 6.2 to 6.8 we started seeing this on client devices running the QCA6174 hw 3.2 firmware ver WLAN.RM.4.4.1-00288- api 6. We see it during disconnects which isn't as big of a deal, the more concerning time is during roams which makes roams go from less than 200ms to over 5 seconds.

Based on this report I have tried using Remi's set of patches [1] which implement flush_sta(), but we end up with the same ~5 second hang, just in ath10k_flush_sta() instead of ath10k_flush(). I'm unsure if this is a firmware problem, or some race within the driver itself. In the past I have reduced timeouts [2] to work around these type of things but its really just a band-aid.

I would agree that this was "introduced" by Johannes' commit above, but the original commit does make sense... This is just an ath10k problem with flushing the queue's.

At this point I'm really left with two options:

 - Revert Johannes commit to flush the queues, thereby reducing security, OR

 - Reduce the timeout from 5 seconds to something more manageable, like 1 second (hopefully someone more in the know can comment here).

Has anyone else looked at this regression? Maybe has some workaround other than my options above?

Thanks,

James

[1] https://lore.kernel.org/linux-wireless/17d26d6a3e80ff03939ee7935fdc07f979b61a4f.1732293922.git.repk@xxxxxxxxxxxx/

[2] https://lore.kernel.org/linux-wireless/20240814164507.996303-2-prestwoj@xxxxxxxxx/





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux