On 3/26/21 5:18 PM, Ben Greear wrote:
I've been digging into a bug where our ath10k driver shows periodic throughput drops on regular intervals. We've bisected this down to a patch where we disable the firmware connection monitor, and so ask mac80211 to do the connection monitor. This works fine in 5.4 kernel, but in 5.11, it does not work well. First, if anyone has an idea what change might have caused this, please let me know. We will try with ath9k, assuming it uses the mac80211 connection monitor to see if it has the same issue.
Ok, it took a while, but I bisected to this: commit 9abf4e49830d606f18a05111cfa96b8f0b724c7d (HEAD, refs/bisect/good-9abf4e49830d606f18a05111cfa96b8f0b724c7d) Author: Felix Fietkau <nbd@xxxxxxxx> Date: Tue Sep 8 14:36:56 2020 +0200 mac80211: optimize station connection monitor Calling mod_timer for every rx/tx packet can be quite expensive. Instead of constantly updating the timer, we can simply let it run out and check the timestamp of the last ACK or rx packet to re-arm it. Signed-off-by: Felix Fietkau <nbd@xxxxxxxx> Link: https://lore.kernel.org/r/20200908123702.88454-9-nbd@xxxxxxxx Signed-off-by: Johannes Berg <johannes.berg@xxxxxxxxx> To do the bisect, I copied my ath10k-ct driver from the 5.4 kernel (well tested driver) over whatever ath10k code was in the particular kernel commit I was testing. I tweaked the driver slightly to compile and work against stock kernel. The failure case is that when in station mode, and transmitting UDP in upload direction (with a few packets per second of download traffic too), the traffic periodically goes to zero throughput every 30 seconds, and stays quiesced for about 5 seconds, and then resumes. The station stays connected. In previous debugging, I noticed this only happens when my driver enables mac80211 connection monitoring. In a different bisect attempt, my driver hit the issue when changing how tx-descriptor count was configured, but I am not fully confident that is a root cause, and changing things a bit made that problem go away. The problem is not seen with ath9k, nor stock ath10k. Stock ath10k uses in-firmware connection monitoring. Felix, if you have any ideas of likely failure points, please let me know. Thanks, Ben