Hello everyone,
I've encountered a possible issue in a DD-WRT [1] setup where broadcast
packets stop being delivered after a GTK (Group Temporal Key) exchange.
This issue occurs on a system with the following hardware:
Access Point Hardware: DynaLink DL-WRX36
Router Software: DD-WRT v3.0-r58819 std (12/13/24)
CPU: Qualcomm IPQ8072A
WiFi Chips: Qualcomm QCN5024 and Qualcomm QCN5054
WiFi Driver: ath11k
Firmware: WLAN.HK.2.12-01460-QCAHKSWPL_SILICONZ-1
NSS FW version: NSS.FW.12.5-210-HK.R
Kernel: Linux WL-AP-EG 6.6.64-rt29 #1791 SMP Thu Dec 12 16:41:51
+07 2024 aarch64 DD-WRT
The behavior is such that after a GTK exchange, the AP can get into a
"weird state". When being there, broadcast frames like ARP or mDNS are
no longer reliably delivered to connected clients while unicasts come
still through. In this "weird state", the channel quality (active time
vs. busy time) goes down and latencies to the still reachable WIFI
clients rise.
I've come across a related bug report on GitHub that describes a similar
issue:
https://github.com/openwrt/openwrt/issues/9555#issuecomment-2433857175
Unfortunately, the GitHub discussion drifted towards various other
possible bugs.
In the meantime, I have a done a lot of additional debugging, but I am
coming to a dead end due to limited knowledge of the ath11k driver and
firmware internals. Interestingly, the AP can get back from "weird
state" to "normal state" after another GTK rekey event. I've seen this
behavior only in the 5 GHz band, yet (using non-DFS-channels).
My questions to the Linux wireless experts and developers in this community:
· Is such a behavior known with ath11k on IPQ8072A or on the mentioned
WiFi chips (QCN5024/QCN5054)?
· Could this be a driver or firmware issue that specifically arises
after a GTK or even GMK exchange?
· What can I do to debug it further? I've switched on debugging in
"hostapd" in order to see the keying events. Are there more lower-level
logs I can get from the WiFi chip and match to the latency and key
exchange observations?
· Are there any additional information I can/should deliver to give
the devs more insight about this issue?
When exchanging the DynaLink DL-WRX36 AP by a Netgear R7800 AP (CPU: QCA
IPQ8065), its predecessor, the problem is gone without touching any of
the clients.
Thank you in advance for any insights or experiences regarding this issue.
Best regards,
Steffen
[1] https://dd-wrt.com/
--
✂-----------------------------------------------------------------------
Dipl.-Inf. Steffen Moser Tel (Office): +49.731.50.32407
School of Advanced Professional Studies Ulm University, Room: 1013
https://wissenschaftliche-weiterbildung.org/ Oberberghof 7, 89081 Ulm
https://saps.uni-ulm.de/ Germany