Search Linux Wireless

Re: [PATCH v2] wifi: ath11k: Add firmware coredump collection support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 7/11/2024 12:20 AM, Kalle Valo wrote:
Miaoqing Pan <quic_miaoqing@xxxxxxxxxxx> writes:

In case of firmware assert snapshot of firmware memory is essential for
debugging. Add firmware coredump collection support for PCI bus.
Collect RDDM and firmware paging dumps from MHI and pack them in TLV
format and also pack various memory shared during QMI phase in separate
TLVs.  Add necessary header and share the dumps to user space using dev
coredump framework. Coredump collection is disabled by default and can
be enabled using menuconfig. Dump collected for a radio is 55 MB
approximately.

The changeset is mostly copied from:
https://lore.kernel.org/all/20240325183414.4016663-1-quic_ssreeela@xxxxxxxxxxx/.

Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04358-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1

Signed-off-by: Miaoqing Pan <quic_miaoqing@xxxxxxxxxxx>

This has a similar KASAN warning as did the ath12k patch:

[   48.364483] modprobe (1116) used greatest stack depth: 21760 bytes left
[   48.450859] ath11k_pci 0000:06:00.0: chip_id 0x2 chip_family 0xb board_id 0x106 soc_id 0x400c0200
[   48.451252] ath11k_pci 0000:06:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
[   63.694922] ath11k_pci 0000:06:00.0: simulating firmware assert crash
[   64.118179] ath11k_pci 0000:06:00.0: firmware crashed: MHI_CB_EE_RDDM
[   64.132388] ==================================================================
[   64.132470] BUG: KASAN: vmalloc-out-of-bounds in ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
[   64.132530] Write of size 4 at addr ffffc9000497520c by task kworker/u32:2/88
[   64.132578]
[   64.132610] CPU: 5 PID: 88 Comm: kworker/u32:2 Not tainted 6.10.0-rc6-wt-ath+ #1678
[   64.132659] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021
[   64.132719] Workqueue: ath11k_aux_wq ath11k_core_reset [ath11k]
[   64.132791] Call Trace:
[   64.132827]  <TASK>
[   64.132867]  dump_stack_lvl+0x7d/0xe0
[   64.132910]  print_address_description.constprop.0+0x33/0x3a0
[   64.132958]  print_report+0xb5/0x260
[   64.133038]  ? kasan_addr_to_slab+0xd/0x80
[   64.133096]  kasan_report+0xd8/0x110
[   64.133132]  ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
[   64.133179]  ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
[   64.133225]  __asan_report_store_n_noabort+0x12/0x20
[   64.133266]  ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
[   64.133311]  ? ath11k_pci_coredump_calculate_size+0x710/0x710 [ath11k_pci]
[   64.133358]  ? lock_sync+0x1a0/0x1a0
[   64.133398]  ath11k_coredump_collect+0x60/0x73 [ath11k]
[   64.133466]  ath11k_core_reset+0x225/0x640 [ath11k]
[   64.133524]  ? debug_smp_processor_id+0x17/0x20
[   64.133564]  process_one_work+0x8cc/0x19c0
[   64.133893]  ? pwq_dec_nr_in_flight+0x580/0x580
[   64.133934]  ? move_linked_works+0x128/0x2c0
[   64.134007]  ? assign_work+0x15e/0x270
[   64.134074]  worker_thread+0x715/0x1230
[   64.134114]  ? __this_cpu_preempt_check+0x13/0x20
[   64.134153]  ? lockdep_hardirqs_on+0x7d/0x100
[   64.134192]  ? rescuer_thread+0xdb0/0xdb0
[   64.134229]  kthread+0x2fa/0x3f0
[   64.134266]  ? kthread_insert_work_sanity_check+0xd0/0xd0
[   64.134308]  ret_from_fork+0x31/0x70
[   64.134345]  ? kthread_insert_work_sanity_check+0xd0/0xd0
[   64.134386]  ret_from_fork_asm+0x11/0x20
[   64.134426]  </TASK>
[   64.134459]
[   64.134498] The buggy address belongs to the virtual mapping at#012[   64.134498]  [ffffc90003965000, ffffc90004977000) created by:#012[   64.134498]  ath11k_pci_coredump_download+0x144/0x12c0 [ath11k_pci]
[   64.134576]
[   64.134606] The buggy address belongs to the physical page:
[   64.134648] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13faee
[   64.134699] flags: 0x200000000000000(node=0|zone=2)
[   64.134746] raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000
[   64.134796] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[   64.135523] page dumped because: kasan: bad access detected
[   64.136273]
[   64.136928] Memory state around the buggy address:
[   64.137641]  ffffc90004975100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   64.138361]  ffffc90004975180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   64.139074] >ffffc90004975200: 00 04 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[   64.139742]                       ^
[   64.140495]  ffffc90004975280: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[   64.141222]  ffffc90004975300: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[   64.141892] ==================================================================
[   64.142674] Disabling lock debugging due to kernel taint
[   64.143630] ath11k_pci 0000:06:00.0: Uploading coredump

Thanks for catching this issue, it's triggered by the statement of 'dump_tlv->type = cpu_to_le32(mem_idx);' The 'buf' which assigned to 'dump_tlv' is out-of-bounds. Will be fixed in the next version.


---
v2: fix implicit declaration of function 'vzalloc'.

  drivers/net/wireless/ath/ath11k/Kconfig    |  11 ++
  drivers/net/wireless/ath/ath11k/Makefile   |   1 +
  drivers/net/wireless/ath/ath11k/core.c     |   2 +
  drivers/net/wireless/ath/ath11k/core.h     |   5 +
  drivers/net/wireless/ath/ath11k/coredump.c |  52 ++++++
  drivers/net/wireless/ath/ath11k/coredump.h |  79 +++++++++
  drivers/net/wireless/ath/ath11k/hif.h      |   7 +
  drivers/net/wireless/ath/ath11k/mhi.c      |   5 +
  drivers/net/wireless/ath/ath11k/mhi.h      |   1 +
  drivers/net/wireless/ath/ath11k/pci.c      | 191 +++++++++++++++++++++
  drivers/net/wireless/ath/ath11k/qmi.c      |  45 ++---
  drivers/net/wireless/ath/ath11k/qmi.h      |   9 +-
  12 files changed, 384 insertions(+), 24 deletions(-)
  create mode 100644 drivers/net/wireless/ath/ath11k/coredump.c
  create mode 100644 drivers/net/wireless/ath/ath11k/coredump.h

I feel that the QMI changes should be in a separat patch and explaining
in detail what they are about. Didn't review those now as there's no
explanation.

Minor changes for updating 'iaddr' definition. IMO, don't need a separate patch.
struct target_mem_chunk {
        u32 prev_size;
        u32 prev_type;
        dma_addr_t paddr;
-       u32 *vaddr;
-       void __iomem *iaddr;
+       union {
+               u32 *vaddr;
+               void __iomem *iaddr;
+       } v;
 };



diff --git a/drivers/net/wireless/ath/ath11k/Kconfig b/drivers/net/wireless/ath/ath11k/Kconfig
index 27f0523bf967..bb91da0098b4 100644
--- a/drivers/net/wireless/ath/ath11k/Kconfig
+++ b/drivers/net/wireless/ath/ath11k/Kconfig
@@ -57,3 +57,14 @@ config ATH11K_SPECTRAL
  	  Enable ath11k spectral scan support
Say Y to enable access to the FFT/spectral data via debugfs.
+
+config ATH11K_COREDUMP
+	bool "ath11k coredump"
+	depends on ATH11K
+	select WANT_DEV_COREDUMP
+	help
+	  Enable ath11k coredump collection
+
+	  If unsure, say Y to make it easier to debug problems. But if
+	  dump collection not required choose N.

I'm not sure if a new Kconfig option is justified? Maybe instead just
use CONFIG_DEV_COREDUMP directly.

ok, will be removed.





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux