On 12/8/2022 1:24 PM, Kalle Valo wrote:
Manikanta Pubbisetty <quic_mpubbise@xxxxxxxxxxx> writes:
On 11/23/2022 9:35 PM, Kalle Valo wrote:
Kalle Valo <kvalo@xxxxxxxxxx> writes:
Manikanta Pubbisetty <quic_mpubbise@xxxxxxxxxxx> writes:
Currently, WLAN chip is powered once during driver probe and is kept
ON (powered) always even when WLAN is not active; keeping the chip
powered ON all the time will consume extra power which is not
desirable for battery operated devices. Same is the case with non-WoW
suspend, chip will not be put into low power mode when the system is
suspended resulting in higher battery drain.
Send QMI MODE OFF command to firmware during WiFi OFF to put device
into low power mode.
Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16
Manikanta Pubbisetty (3):
ath11k: Fix double free issue during SRNG deinit
ath11k: Move hardware initialization logic to start()
ath11k: Enable low power mode when WLAN is not active
---
V3:
- Removed patch "ath11k: Fix failed to parse regulatory event print" as it is not needed anymore
- Fixed a potential deadlock scenario reported by lockdep around ab->core_lock with V2 changes
- Fixed other minor issues that were found during code review
- Spelling corrections in the commit messages
I still see a crash, immediately after the first rmmod:
Nov 22 11:05:47 nuc2 [ 139.378719] rmmod ath11k_pci
Nov 22 11:05:48 nuc2 [ 139.892395] general protection fault, probably
for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP
DEBUG_PAGEALLOC KASAN
Nov 22 11:05:48 nuc2 [ 139.892453] KASAN: null-ptr-deref in range
[0x00000000000001f0-0x00000000000001f7]
Really odd that you don't see it. Unfortunately not able to debug this
further right now.
This is with:
wcn6855 hw2.0 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9
A bit more information how I see the crash. So first I have all modules
loaded:
$ lsmod
Module Size Used by
ath11k_pci 57344 0
ath11k 2015232 1 ath11k_pci
mac80211 3284992 1 ath11k
libarc4 16384 1 mac80211
cfg80211 2494464 2 ath11k,mac80211
qmi_helpers 57344 1 ath11k
qrtr_mhi 20480 0
mhi 217088 2 ath11k_pci,qrtr_mhi
qrtr 98304 5 qrtr_mhi
nvme 122880 3
nvme_core 299008 5 nvme
$
Then I just remove ath11k_pci module and boom:
$ sudo rmmod ath11k_pci
[ 153.658409] general protection fault, probably for non-canonical
address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
KASAN
This happens every time, there doesn't seem to be any randomness on the
behaviour.
Thanks for the help Kalle, this is exactly what I was doing in my
tests. Unfortunately, I'm not able to reproduce the problem. I have
also tried with the exact firmware that you have pointed out. Let me
see if I'm missing anything.
I tested this more and patch 3 seems to be the one causing the crash. I
didn't see this when patch 1-2 were applied.
The crash happens in ath11k_dp_process_rxdma_err() in this line:
srng = &ab->hal.srng_list[err_ring->ring_id];
ab looks sane to me (0xffff88814c960000) but err_ring is set to 0x200.
Does this help?
Thanks for your time and analysis on the bug.
From the callstack() that you have shared earlier, it looks like
ath11k_dp_process_rxdma_err() is called from dp_service_srngs which is a
napi poll callback.
To me it looks like the napi handler of the driver is getting called
after the srng resources have been de-initialized during rmmod.
I have closed examined the changes in patch 3 (I'm still doing it). so
far, I could not find anything that could trigger this kind of a
problem. Since NAPI is disabled upon rmmod, NAPI poll should not be called.
It's quite not clear if I'm missing something.
Thanks,
Manikanta