Miaoqing Pan <quic_miaoqing@xxxxxxxxxxx> writes: > In case of firmware assert snapshot of firmware memory is essential for > debugging. Add firmware coredump collection support for PCI bus. > Collect RDDM and firmware paging dumps from MHI and pack them in TLV > format and also pack various memory shared during QMI phase in separate > TLVs. Add necessary header and share the dumps to user space using dev > coredump framework. Coredump collection is disabled by default and can > be enabled using menuconfig. Dump collected for a radio is 55 MB > approximately. > > The changeset is mostly copied from: > https://lore.kernel.org/all/20240325183414.4016663-1-quic_ssreeela@xxxxxxxxxxx/. > > Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04358-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 > > Signed-off-by: Miaoqing Pan <quic_miaoqing@xxxxxxxxxxx> This has a similar KASAN warning as did the ath12k patch: [ 48.364483] modprobe (1116) used greatest stack depth: 21760 bytes left [ 48.450859] ath11k_pci 0000:06:00.0: chip_id 0x2 chip_family 0xb board_id 0x106 soc_id 0x400c0200 [ 48.451252] ath11k_pci 0000:06:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 [ 63.694922] ath11k_pci 0000:06:00.0: simulating firmware assert crash [ 64.118179] ath11k_pci 0000:06:00.0: firmware crashed: MHI_CB_EE_RDDM [ 64.132388] ================================================================== [ 64.132470] BUG: KASAN: vmalloc-out-of-bounds in ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] [ 64.132530] Write of size 4 at addr ffffc9000497520c by task kworker/u32:2/88 [ 64.132578] [ 64.132610] CPU: 5 PID: 88 Comm: kworker/u32:2 Not tainted 6.10.0-rc6-wt-ath+ #1678 [ 64.132659] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021 [ 64.132719] Workqueue: ath11k_aux_wq ath11k_core_reset [ath11k] [ 64.132791] Call Trace: [ 64.132827] <TASK> [ 64.132867] dump_stack_lvl+0x7d/0xe0 [ 64.132910] print_address_description.constprop.0+0x33/0x3a0 [ 64.132958] print_report+0xb5/0x260 [ 64.133038] ? kasan_addr_to_slab+0xd/0x80 [ 64.133096] kasan_report+0xd8/0x110 [ 64.133132] ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] [ 64.133179] ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] [ 64.133225] __asan_report_store_n_noabort+0x12/0x20 [ 64.133266] ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci] [ 64.133311] ? ath11k_pci_coredump_calculate_size+0x710/0x710 [ath11k_pci] [ 64.133358] ? lock_sync+0x1a0/0x1a0 [ 64.133398] ath11k_coredump_collect+0x60/0x73 [ath11k] [ 64.133466] ath11k_core_reset+0x225/0x640 [ath11k] [ 64.133524] ? debug_smp_processor_id+0x17/0x20 [ 64.133564] process_one_work+0x8cc/0x19c0 [ 64.133893] ? pwq_dec_nr_in_flight+0x580/0x580 [ 64.133934] ? move_linked_works+0x128/0x2c0 [ 64.134007] ? assign_work+0x15e/0x270 [ 64.134074] worker_thread+0x715/0x1230 [ 64.134114] ? __this_cpu_preempt_check+0x13/0x20 [ 64.134153] ? lockdep_hardirqs_on+0x7d/0x100 [ 64.134192] ? rescuer_thread+0xdb0/0xdb0 [ 64.134229] kthread+0x2fa/0x3f0 [ 64.134266] ? kthread_insert_work_sanity_check+0xd0/0xd0 [ 64.134308] ret_from_fork+0x31/0x70 [ 64.134345] ? kthread_insert_work_sanity_check+0xd0/0xd0 [ 64.134386] ret_from_fork_asm+0x11/0x20 [ 64.134426] </TASK> [ 64.134459] [ 64.134498] The buggy address belongs to the virtual mapping at#012[ 64.134498] [ffffc90003965000, ffffc90004977000) created by:#012[ 64.134498] ath11k_pci_coredump_download+0x144/0x12c0 [ath11k_pci] [ 64.134576] [ 64.134606] The buggy address belongs to the physical page: [ 64.134648] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13faee [ 64.134699] flags: 0x200000000000000(node=0|zone=2) [ 64.134746] raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000 [ 64.134796] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 64.135523] page dumped because: kasan: bad access detected [ 64.136273] [ 64.136928] Memory state around the buggy address: [ 64.137641] ffffc90004975100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 64.138361] ffffc90004975180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 64.139074] >ffffc90004975200: 00 04 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [ 64.139742] ^ [ 64.140495] ffffc90004975280: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [ 64.141222] ffffc90004975300: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [ 64.141892] ================================================================== [ 64.142674] Disabling lock debugging due to kernel taint [ 64.143630] ath11k_pci 0000:06:00.0: Uploading coredump > --- > v2: fix implicit declaration of function 'vzalloc'. > > drivers/net/wireless/ath/ath11k/Kconfig | 11 ++ > drivers/net/wireless/ath/ath11k/Makefile | 1 + > drivers/net/wireless/ath/ath11k/core.c | 2 + > drivers/net/wireless/ath/ath11k/core.h | 5 + > drivers/net/wireless/ath/ath11k/coredump.c | 52 ++++++ > drivers/net/wireless/ath/ath11k/coredump.h | 79 +++++++++ > drivers/net/wireless/ath/ath11k/hif.h | 7 + > drivers/net/wireless/ath/ath11k/mhi.c | 5 + > drivers/net/wireless/ath/ath11k/mhi.h | 1 + > drivers/net/wireless/ath/ath11k/pci.c | 191 +++++++++++++++++++++ > drivers/net/wireless/ath/ath11k/qmi.c | 45 ++--- > drivers/net/wireless/ath/ath11k/qmi.h | 9 +- > 12 files changed, 384 insertions(+), 24 deletions(-) > create mode 100644 drivers/net/wireless/ath/ath11k/coredump.c > create mode 100644 drivers/net/wireless/ath/ath11k/coredump.h I feel that the QMI changes should be in a separat patch and explaining in detail what they are about. Didn't review those now as there's no explanation. > diff --git a/drivers/net/wireless/ath/ath11k/Kconfig b/drivers/net/wireless/ath/ath11k/Kconfig > index 27f0523bf967..bb91da0098b4 100644 > --- a/drivers/net/wireless/ath/ath11k/Kconfig > +++ b/drivers/net/wireless/ath/ath11k/Kconfig > @@ -57,3 +57,14 @@ config ATH11K_SPECTRAL > Enable ath11k spectral scan support > > Say Y to enable access to the FFT/spectral data via debugfs. > + > +config ATH11K_COREDUMP > + bool "ath11k coredump" > + depends on ATH11K > + select WANT_DEV_COREDUMP > + help > + Enable ath11k coredump collection > + > + If unsure, say Y to make it easier to debug problems. But if > + dump collection not required choose N. I'm not sure if a new Kconfig option is justified? Maybe instead just use CONFIG_DEV_COREDUMP directly. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches