Hi,
So I have no resolution to this (trying to get the AP vendor to chase it
down), but I'm toying with the idea of trying to work around whatever
issue the AP is having when this occurs. The only thing I can think of
is that there is a 3 second delay between the authentication and
reassociation, and perhaps this is causing some timeout in the AP and in
turn the deauth.
I'm wondering how long it should take to add/remove a key from the
firmware? 3 seconds seems very long, and I question if this timeout is
really necessary or was just chosen arbitrarily? Is this something that
could be lowered down to e.g. 1 second without negative impacts? The
code in question is in ath10k_install_key:
ret = ath10k_send_key(arvif, key, cmd, macaddr, flags);
if (ret)
return ret;
time_left = wait_for_completion_timeout(&ar->install_key_done, 3 * HZ);
if (time_left == 0)
return -ETIMEDOUT;
Thanks,
James
On 7/15/24 4:54 AM, James Prestwood wrote:
I forgot to mention:
QCA6174 hw3.0 firmware WLAN.RM.4.4.1-00288-
The higher rate of frequency is happening on kernel 5.15, although as
I said only at one location with a different AP vendor. We have many
other 5.15 devices with significantly less instances of this
happening. I also checked a few of our newer software releases using
kernel 6.2, and the timeout occurred there as well, but no real impact
(no disconnect, no assoc timeout).
On 7/12/24 6:11 AM, James Prestwood wrote:
Hi,
I've seen this error mentioned on random forum posts, but its always
associated with a kernel crash/warning or some very obvious negative
behavior. I've noticed this occasionally and at one location very
frequently during FT roaming, specifically just after CMD_ASSOCIATE
is issued. For our company run networks I'm not seeing any negative
behavior apart from a 3 second delay in sending the re-association
frame since the kernel waits for this timeout. But we have some
networks our clients run on that we do not own (different vendor),
and we are seeing association timeouts after this error occurs and in
some cases the AP is sending a deauthentication with reason code 8
instead of replying with a reassociation reply and an error status,
which is quite odd.
We are chasing down this with the vendor of these APs as well, but
the behavior always happens after we see this key removal
failure/timeout on the client side. So it would appear there is
potentially a problem on both the client and AP. My guess is
_something_ about the re-association frame changes when this error is
encountered, but I cannot see how that would be the case. We are
working to get PCAPs now, but its through a 3rd party, so that timing
is out of my control.
From the kernel code this error would appear innocuous, the old key
is failing to be removed but it gets immediately replaced by the new
key. And we don't see that addition failing. Am I understanding that
logic correctly? I.e. this logic:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503
Below are a few kernel logs of the issue happening, some with the
deauth being sent by the AP, some with just timeouts:
--- No deauth frame sent, just association timeouts after the error ---
Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for
new assoc to <new BSS>
Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install
key for vdev 0 peer <previous BSS>: -110
Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous
BSS>) from hardware (-110)
Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
Jul 11 00:05:36 kernel: wlan0: authenticated
Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS>
(capab=0x1111 status=0 aid=16)
Jul 11 00:05:36 kernel: wlan0: associated
--- Deauth frame sent amidst the association timeouts ---
Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for
new assoc to <new BSS>
Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install
key for vdev 0 peer <previous BSS>: -110
Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous
BSS>) from hardware (-110)
Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while
associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
Jul 11 00:43:24 kernel: wlan0: authenticated
Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS>
(capab=0x1111 status=0 aid=101)
Jul 11 00:43:24 kernel: wlan0: associated