Kernel panic every time in kernel when running below case: steps: 1. connect to an AP with good signal strength 2. echo 0x7f > /sys/kernel/debug/ieee80211/phy0/ath10k/pktlog_filter 3. echo 0xffff 0 > /sys/kernel/debug/ieee80211/phy0/ath10k/fw_dbglog 4. echo 0 > /sys/module/ath10k_core/parameters/debug_mask 5. sudo trace-cmd record -e ath10k 6. run "iperf -c 192.168.1.1 -u -b 100m -i 1 -t 30" 7. kernel panic immeditely It always crash at trace_event_raw_event_ath10k_xxx, below is 2 sample: dmesg 1: [ 58.038310] Unable to handle kernel paging request at virtual address 003130393837363d [ 58.038325] Mem abort info: [ 58.038330] ESR = 0x96000004 [ 58.038336] Exception class = DABT (current EL), IL = 32 bits [ 58.038340] SET = 0, FnV = 0 [ 58.038345] EA = 0, S1PTW = 0 [ 58.038349] Data abort info: [ 58.038353] ISV = 0, ISS = 0x00000004 [ 58.038358] CM = 0, WnR = 0 [ 58.038363] [003130393837363d] address between user and kernel address ranges [ 58.038370] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 58.038377] Modules linked in: rfcomm algif_hash algif_skcipher [ 58.038385] Unable to handle kernel paging request at virtual address 003938373635343b [ 58.038386] af_alg uinput cros_ec_rpmsg mtk_dip mtk_mdp3 [ 58.038395] Mem abort info: [ 58.038403] ESR = 0x96000004 [ 58.038412] Exception class = DABT (current EL), IL = 32 bits [ 58.038419] SET = 0, FnV = 0 [ 58.038427] EA = 0, S1PTW = 0 [ 58.038434] Data abort info: [ 58.038442] ISV = 0, ISS = 0x00000004 [ 58.038449] CM = 0, WnR = 0 [ 58.038457] [003938373635343b] address between user and kernel address ranges [ 58.038486] Process kworker/u16:1 (pid: 141, stack limit = 0x000000009bd6ba2e) [ 58.038496] CPU: 1 PID: 141 Comm: kworker/u16:1 Not tainted 4.19.139 #162 [ 58.038501] Hardware name: MediaTek krane sku176 board (DT) [ 58.038515] Workqueue: ath10k_sdio_wq ath10k_sdio_write_async_work [ath10k_sdio] [ 58.038523] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 58.038536] pc : filter_match_preds+0x20/0x94 [ 58.038546] lr : trace_event_buffer_commit+0x150/0x22c [ 58.038550] sp : ffffff80088a39f0 [ 58.038555] x29: ffffff80088a3a20 x28: 000000000000000c [ 58.038561] x27: 000000000000000c x26: 0000000000000001 [ 58.038567] x25: ffffffdfd96ef008 x24: 0000000000000000 [ 58.038573] x23: 0000000000000000 x22: ffffffe03b006c00 [ 58.038580] x21: ffffffdfd96ef000 x20: 3231303938373635 [ 58.038586] x19: ffffffe02de9b160 x18: 0000000000000020 [ 58.038592] x17: 0000000000100000 x16: ffffffa4e67937f0 [ 58.038598] x15: 0000000000100000 x14: 0000393837363534 [ 58.038604] x13: 3332313039383736 x12: 3534333231303938 [ 58.038611] x11: 3736353433323130 x10: 3938373635343332 [ 58.038617] x9 : 3130393837363534 x8 : 000000000000040b [ 58.038623] x7 : 3534333231303938 x6 : ffffffdfd96fb2d1 [ 58.038629] x5 : 0000000000000000 x4 : 0000000000000000 [ 58.038635] x3 : 000000000000c2cd x2 : ffffffffffffffc0 [ 58.038641] x1 : ffffffdfd96ef008 x0 : ffffffdfd96f7080 [ 58.038648] Call trace: [ 58.038655] filter_match_preds+0x20/0x94 [ 58.038661] trace_event_buffer_commit+0x150/0x22c [ 58.038698] trace_event_raw_event_ath10k_log_dbg_dump+0x208/0x240 [ath10k_core] [ 58.038716] ath10k_dbg_dump+0x1bc/0x21c [ath10k_core] [ 58.038724] ath10k_sdio_write+0xc8/0xe8 [ath10k_sdio] [ 58.038730] ath10k_sdio_write_async_work+0x11c/0x1f4 [ath10k_sdio] [ 58.038738] process_one_work+0x210/0x418 [ 58.038744] worker_thread+0x234/0x3dc [ 58.038751] kthread+0x120/0x140 [ 58.038758] ret_from_fork+0x10/0x18 [ 58.038766] Code: 9100c3fd b40002e0 f9400014 b40002b4 (f9400680) [ 58.038773] ---[ end trace 8b9d2463f441eb7f ]--- [ 58.051846] Kernel panic - not syncing: Fatal exception [ 58.051864] SMP: stopping secondary CPUs dmesg 2: [ 228.692905] Unable to handle kernel paging request at virtual address 0033323130393837 [ 228.692913] Mem abort info: [ 228.692920] ESR = 0x96000004 [ 228.692927] Exception class = DABT (current EL), IL = 32 bits [ 228.692933] SET = 0, FnV = 0 [ 228.692938] EA = 0, S1PTW = 0 [ 228.692943] Data abort info: [ 228.692953] ISV = 0, ISS = 0x00000004 [ 228.692965] CM = 0, WnR = 0 [ 228.692975] [0033323130393837] address between user and kernel address ranges [ 228.692983] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 228.693078] Process kworker/u16:2 (pid: 144, stack limit = 0x000000009f882493) [ 228.693088] CPU: 6 PID: 144 Comm: kworker/u16:2 Tainted: G W 4.19.139 #162 [ 228.693095] Hardware name: MediaTek krane sku176 board (DT) [ 228.693125] Workqueue: ath10k_tx_complete_wq ath10k_htc_tx_complete_work [ath10k_core] [ 228.693137] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 228.693152] pc : filter_match_preds+0x38/0x94 [ 228.693164] lr : trace_event_buffer_commit+0x150/0x22c [ 228.693170] sp : ffffff80088b3b20 [ 228.693175] x29: ffffff80088b3b50 x28: 0000000000000402 [ 228.693183] x27: fffffff040001f60 x26: ffffff9d4ba34000 [ 228.693191] x25: fffffff040af5004 x24: 0000000000000000 [ 228.693198] x23: 0000000000000000 x22: fffffff040af7100 [ 228.693205] x21: 0000000000000000 x20: fffffff040af7100 [ 228.693212] x19: fffffff040af5004 x18: 0000000000000000 [ 228.693219] x17: 000000000000003c x16: ffffff9d4da649d8 [ 228.693226] x15: 0000000000000006 x14: 3837363534333231 [ 228.693233] x13: fffffff01514fc00 x12: 0000000000000000 [ 228.693240] x11: 0000000000000000 x10: fffffff040af5004 [ 228.693247] x9 : 0000000000000000 x8 : 0000000000000000 [ 228.693254] x7 : 646b686064ff6e68 x6 : 0000800000000000 [ 228.693261] x5 : 0000000000000000 x4 : 0000000000000000 [ 228.693268] x3 : 000000000000002c x2 : 0000000000000000 [ 228.693275] x1 : fffffff040af5004 x0 : 3433323130393837 [ 228.693282] Call trace: [ 228.693291] filter_match_preds+0x38/0x94 [ 228.693299] trace_event_buffer_commit+0x150/0x22c [ 228.693321] trace_event_raw_event_ath10k_txrx_tx_unref+0x164/0x198 [ath10k_core] [ 228.693344] ath10k_txrx_tx_unref+0x1c0/0x36c [ath10k_core] [ 228.693365] ath10k_htt_htc_tx_complete+0xe4/0x114 [ath10k_core] [ 228.693386] ath10k_htc_notify_tx_completion+0xe4/0x124 [ath10k_core] [ 228.693407] ath10k_htc_tx_complete_work+0xb8/0xf0 [ath10k_core] [ 228.693418] process_one_work+0x210/0x418 [ 228.693425] worker_thread+0x234/0x3dc [ 228.693433] kthread+0x120/0x140 [ 228.693441] ret_from_fork+0x10/0x18 [ 228.693451] Code: aa0103f3 aa1f03e8 2a1f03f5 aa1403f6 (f9400009) [ 228.693459] ---[ end trace 187b0ae372e7515c ]--- [ 228.711074] Kernel panic - not syncing: Fatal exception [ 228.711086] SMP: stopping secondary CPUs [ 228.711254] Kernel Offset: 0x1d45600000 from 0xffffff8008000000 [ 228.711259] CPU features: 0x0,2188200c [ 228.711262] Memory Limit: none The value of prog in filter_match_preds of kernel/trace/trace_events_filter.c is overwrite to the content of the UDP packets's content like this 0x0039383736353433, it is a invalid address, so crash. Content of udp tx packets with iperf: 0000 00 03 7f 46 c4 49 40 49 0f 2f f9 09 08 00 45 00 ...F.I@I./....E. 0010 05 da 18 00 40 00 40 11 99 4a c0 a8 01 77 c0 a8 ....@.@..J...w.. 0020 01 01 9c 7d 13 89 05 c6 89 a0 00 00 01 8b 5d 69 ...}..........]i 0030 98 9a 00 0a 55 b8 00 00 00 00 00 00 00 01 00 00 ....U........... 0040 13 89 00 00 00 00 00 00 00 00 ff ff f4 48 05 f5 .............H.. 0050 e1 00 30 31 32 33 34 35 36 37 38 39 30 31 32 33 ..01234567890123 0060 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 4567890123456789 0070 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 0123456789012345 0080 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 6789012345678901 0090 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 2345678901234567 00a0 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 8901234567890123 00b0 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 4567890123456789 ... 05b0 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 4567890123456789 05c0 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 0123456789012345 05d0 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 6789012345678901 05e0 32 33 34 35 36 37 38 39 23456789 ath10k_htc_send_bundle_skbs allocate skb with size 49792(1556*32), it is bigger than PAGE_SIZE which is normally 4096, then ath10k_sdio_write call ath10k_dbg_dump to trace the large size skb and corrupt the trace data of tracing and lead crash. When disable the TX bundle of SDIO, it does not crash, but TX bundle is for improve throughput, so it is enabled by default. It is useless to call ath10k_dbg_dump to trace the large bundled skb, so this patch trace the top part of large bundled skb. I also did another try with below change, it is move the ath10k_dbg_dump from ath10k_sdio_write to ath10k_sdio_mbox_rx_fetch_bundle, it also crash immediately when run iperf UDP RX traffic, RX bundle is enabled by default for SDIO, so the virt_pkt_len of ath10k_sdio_mbox_rx_fetch_bundle is large, max value is 57344(1792*32), so it crashed same with upper test. Change of try: diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c --- a/drivers/net/wireless/ath/ath10k/sdio.c +++ b/drivers/net/wireless/ath/ath10k/sdio.c @@ -336,7 +336,6 @@ static int ath10k_sdio_write(struct ath10k *ar, u32 addr, const void *buf, size_ ath10k_dbg(ar, ATH10K_DBG_SDIO, "sdio write addr 0x%x buf 0x%p len %zu\n", addr, buf, len); - ath10k_dbg_dump(ar, ATH10K_DBG_SDIO_DUMP, NULL, "sdio write ", buf, len); out: sdio_release_host(func); @@ -692,6 +691,8 @@ static int ath10k_sdio_mbox_rx_fetch_bundle(struct ath10k *ar) goto err; } + ath10k_dbg_dump(ar, ATH10K_DBG_SDIO_DUMP, NULL, "sdio read vsg", ar_sdio->vsg_buffer, virt_pkt_len); + pkt_offset = 0; for (i = 0; i < ar_sdio->n_rx_pkts; i++) { pkt = &ar_sdio->rx_pkts[i]; dmesg of try: [ 170.621318] Unable to handle kernel paging request at virtual address 0036353433323138 [ 170.621336] Mem abort info: [ 170.621341] ESR = 0x96000004 [ 170.621347] Exception class = DABT (current EL), IL = 32 bits [ 170.621352] SET = 0, FnV = 0 [ 170.621357] EA = 0, S1PTW = 0 [ 170.621362] Data abort info: [ 170.621367] ISV = 0, ISS = 0x00000004 [ 170.621371] CM = 0, WnR = 0 [ 170.621377] [0036353433323138] address between user and kernel address ranges [ 170.621385] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 170.621479] Process kworker/0:1 (pid: 14, stack limit = 0x00000000c6fcfb2a) [ 170.621489] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 4.19.139 #162 [ 170.621494] Hardware name: MediaTek krane sku176 board (DT) [ 170.621511] Workqueue: events sdio_irq_work [ 170.621519] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 170.621529] pc : filter_match_preds+0x20/0x94 [ 170.621540] lr : trace_event_buffer_commit+0x150/0x22c [ 170.621545] sp : ffffff80080f37d0 [ 170.621550] x29: ffffff80080f3800 x28: 000000000000000c [ 170.621556] x27: 000000000000000c x26: 0000000000000001 [ 170.621563] x25: fffffff118eb8008 x24: 0000000000000000 [ 170.621569] x23: 0000000000000000 x22: fffffff17b007900 [ 170.621576] x21: fffffff118eb8000 x20: 3736353433323130 [ 170.621583] x19: fffffff178645790 x18: 0000000000000020 [ 170.621589] x17: 000000000000003c x16: ffffff91fe3937f0 [ 170.621596] x15: 0000000000000006 x14: 00000636000ad8b0 [ 170.621602] x13: 0000000000000000 x12: 0000000000000000 [ 170.621608] x11: 0000000000000000 x10: 0000000000000000 [ 170.621615] x9 : 0000000000000000 x8 : 000000000000040b [ 170.621621] x7 : 0000000000000000 x6 : fffffff118ec6052 [ 170.621628] x5 : 0000000000000000 x4 : 0000000000000000 [ 170.621634] x3 : 000000000000e04e x2 : ffffffffffffffc0 [ 170.621640] x1 : fffffff118eb8008 x0 : fffffff118ec1f00 [ 170.621648] Call trace: [ 170.621655] filter_match_preds+0x20/0x94 [ 170.621662] trace_event_buffer_commit+0x150/0x22c [ 170.621704] trace_event_raw_event_ath10k_log_dbg_dump+0x208/0x240 [ath10k_core] [ 170.621724] ath10k_dbg_dump+0x1bc/0x21c [ath10k_core] [ 170.621734] ath10k_sdio_readsb+0xdc/0xfc [ath10k_sdio] [ 170.621742] ath10k_sdio_irq_handler+0x6f8/0xca8 [ath10k_sdio] [ 170.621749] process_sdio_pending_irqs+0x60/0x1b4 [ 170.621756] sdio_run_irqs+0x34/0x60 [ 170.621762] sdio_irq_work+0x1c/0x28 [ 170.621769] process_one_work+0x210/0x418 [ 170.621775] worker_thread+0x234/0x3dc [ 170.621783] kthread+0x120/0x140 [ 170.621791] ret_from_fork+0x10/0x18 [ 170.621799] Code: 9100c3fd b40002e0 f9400014 b40002b4 (f9400680) [ 170.621807] ---[ end trace b52c34b9c8deb8e3 ]--- [ 170.624123] Unable to handle kernel paging request at virtual address dffd74685cec003c [ 170.624134] Mem abort info: [ 170.624138] ESR = 0x96000004 [ 170.624143] Exception class = DABT (current EL), IL = 32 bits [ 170.624146] SET = 0, FnV = 0 [ 170.624150] EA = 0, S1PTW = 0 [ 170.624153] Data abort info: [ 170.624157] ISV = 0, ISS = 0x00000004 [ 170.624160] CM = 0, WnR = 0 [ 170.624165] [dffd74685cec003c] address between user and kernel address ranges [ 170.634352] Kernel panic - not syncing: Fatal exception [ 170.634370] SMP: stopping secondary CPUs [ 171.711302] SMP: failed to stop secondary CPUs 0,4 [ 171.711310] Kernel Offset: 0x11f6200000 from 0xffffff8008000000 [ 171.711316] CPU features: 0x0,2188200c [ 171.711321] Memory Limit: none [ 171.723885] SMP: stopping secondary CPUs [ 172.800814] SMP: failed to stop secondary CPUs 0,4 trace_event_raw_event_ath10k_log_dbg_dump is generated by compiler, it call trace_event_buffer_reserve got get a struct pointer *entry, its type is trace_event_raw_ath10k_log_dbg_dump which is also generated by compiler, trace_event_buffer_reserve of kernel/trace/trace_events.c call trace_event_buffer_lock_reserve to get ring_buffer_event. In function trace_event_buffer_lock_reserve of kernel/trace/trace.c, the ring_buffer_time_stamp_abs is false and trace_file->flags is 0x40b which is set bit of EVENT_FILE_FL_FILTERED by debugging, so it use the temp buffer this_cpu_read(trace_buffered_event), and the buffer size is 1 page size which allocatee in trace_buffered_event_enable by alloc_pages_node, and then ath10k pass the buffer size > 1 page trigger overflow and crash. Based on upper test, try and debugging, pass large buff size to function trace_ath10k_log_dbg_dump cause crash, and it has ath10k_dbg in ath10k_sdio_write to print the length of skb/buffer, it is not necessary to trace all content of the large skb. Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049 Signed-off-by: Wen Gong <wgong@xxxxxxxxxxxxxx> --- drivers/net/wireless/ath/ath10k/debug.c | 2 +- drivers/net/wireless/ath/ath10k/debug.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c index e8250a665433..5433dbdde0e5 100644 --- a/drivers/net/wireless/ath/ath10k/debug.c +++ b/drivers/net/wireless/ath/ath10k/debug.c @@ -2718,7 +2718,7 @@ void ath10k_dbg_dump(struct ath10k *ar, /* tracing code doesn't like null strings :/ */ trace_ath10k_log_dbg_dump(ar, msg ? msg : "", prefix ? prefix : "", - buf, len); + buf, min_t(size_t, len, ATH10K_LOG_DUMP_MAX)); } EXPORT_SYMBOL(ath10k_dbg_dump); diff --git a/drivers/net/wireless/ath/ath10k/debug.h b/drivers/net/wireless/ath/ath10k/debug.h index 997c1c80aba7..cec9ba92f28f 100644 --- a/drivers/net/wireless/ath/ath10k/debug.h +++ b/drivers/net/wireless/ath/ath10k/debug.h @@ -11,6 +11,8 @@ #include <linux/types.h> #include "trace.h" +#define ATH10K_LOG_DUMP_MAX 1024 + enum ath10k_debug_mask { ATH10K_DBG_PCI = 0x00000001, ATH10K_DBG_WMI = 0x00000002, -- 2.23.0