Hi Kalle,
On 11/21/23 06:15, Kalle Valo wrote:
James Prestwood <prestwoj@xxxxxxxxx> writes:
Hi,
I've noticed a somewhat rare behavior (<1% of the time) where bringing
an interface up fails after disabling power save. This is done by IWD
when a user-option is enabled, and follows an interface
removal/creation. The sequence is:
1. Remove default interface
2. Create new interface
3. Disable power save on new interface
4. Bring up new interface (rarely fails with -110).
iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save
for ifindex 54
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1286641]: Error bringing interface 54 up: Connection timed out
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
So I guess you have a test setup to reproduce this easily? How long does
it take to see this?
It really depends. I was letting IWD do the work and waiting excessively
which could sometimes take an hour. But I managed to get it to happen 20
seconds apart using ip/iw in a loop. On average it usually happens
within 5-10 minutes, sometimes sooner.
$ dmesg --follow & $ for i in $(seq 1 4000); do sudo ip link set wlan0
down; sudo iw dev wlan0 set power_save off; sudo ip link set wlan0 up; done
[ 882.927218] ath10k_pci 0000:02:00.0: wmi service ready event not
received RTNETLINK answers: Connection timed out [ 883.016550]
ath10k_pci 0000:02:00.0: Could not init core: -110 [ 904.434930]
ath10k_pci 0000:02:00.0: wmi service ready event not received RTNETLINK
answers: Connection timed out [ 904.524287] ath10k_pci 0000:02:00.0:
Could not init core: -110
What hardware and firmware version are you using?
[ 5.680909] ath10k_pci 0000:02:00.0: qca6174 hw3.2 target 0x05030000
chip_id 0x00340aff sub 1dac:0261
[ 5.680921] ath10k_pci 0000:02:00.0: kconfig debug 0 debugfs 1
tracing 1 dfs 0 testmode 0
[ 5.681607] ath10k_pci 0000:02:00.0: firmware ver
WLAN.RM.4.4.1-00309- api 6 features wowlan,ignore-otp,mfp crc32 0793bcf2
I've seen this on 00241, 00288, and the latest firmware above.
I'm thinking maybe there is a race between creating the fresh
interface and disabling PS, then bringing the interface back up?
Yeah, looks like it. If it's relatively easy for you to reproduce this
you could try adding delays to suitable spots in ath10k. That might be
give more ideas where the race is exactly.
Ok, I can add a delay ahead of the WMI request that's timing out and see
what happens.
Setting PS after the interface is up (so far) seems to be 100%
reliable and have not yet seen this behavior. Regardless of any kernel
fixes I'll need to reorder the command sequence in IWD to support
existing kernels, but before I work around this issue I wanted to
point it out so its not hidden from view.
To me working around this in user space doesn't sound like a good idea.
I would rather have a workaround in the kernel (but obviously I hope we
can find a proper fix).
Yep, totally agree.
Thanks,
James