Hi Jérôme, Just a quick note, so you don't have to redo our work - Paweł found the root cause, patch is coming very shortly. TL;DR is that hw->max_rates in wfx_init_common was set to 8 initially, which is over the maximum of 4 specified by mac80211, causing out-of-bounds writes all over the place. Kind regards, Lech W dniu 12.09.2022 o 18:46, Lech Perczak pisze: > Hi Jérôme, > > Probably a Thunderbird mess-up. Let's try again, I hope it works - I probably fiddled too much with the settings to make it send plain-text. > > We're trying to get a WFM200S022XNN3 module working on a custom i.MX6Q board using SDIO interface, using upstream kernel. > Our patches concern primarily the device tree for the board - and upstream firmware from linux-firmware repository. > > During that, we stumbled upon a memory corruption issue, which appears when big traffic is passing through the device. > Our adapter is running in AP mode. This can be reproduced with 100% rate using iperf3, > by starting an AP interface on the device, and an iperf3 server. > Then, the client station runs iperf3 with "iperf3 -c <hostname> -t 3600" command - so the AP is sending data for up to one hour, > however - the kernel on our device crashes after around a few minutes of traffic, sometimes less than a minute. > > The behaviour is the same on kernel v5.19.7, v5.19.2, and even with v6.0-rc5. Tests on v6.0-rc5 have shown most detailed stacktrace so far: > > 8<--- cut here --- > Unable to handle kernel NULL pointer dereference at virtual address 00000101 > [00000101] *pgd=00000000 > Internal error: Oops: 17 [#1] PREEMPT SMP ARM > Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables cdc_mbim cdc_wdm cdc_ncm > cdc_ether usbnet cdc_acm usb_serial_simple usbserial usb_f_rndis u_ether wfx mac80211 libarc4 cfg80211 evbug > phy_generic ci_hdrc_imx ci_hdrc adt7475 hwmon_vid ulpi roles usbmisc_imx pwm_imx27 > pwm_beeper libcomposite configfs udc_core > CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1 > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > PC is at kfree_skb_list_reason+0x10/0x24 > LR is at ieee80211_report_used_skb+0xd0/0x5b4 [mac80211] > pc : [<80773238>] lr : [<7f136538>] psr: 20000113 > sp : f0801e60 ip : 00000000 fp : 838f04e2 > r10: 00000001 r9 : 838f04e2 r8 : 00000000 > r7 : 82661580 r6 : 00000000 r5 : 82660580 r4 : 00000101 > r3 : 838f0700 r2 : 00000032 r1 : 00000001 r0 : 00000101 > Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 11d0004a DAC: 00000051 > Register r0 information: non-paged memory > Register r1 information: non-paged memory > Register r2 information: non-paged memory > Register r3 information: slab kmalloc-1k start 838f0400 pointer offset 768 size 1024 > Register r4 information: non-paged memory > Register r5 information: slab kmalloc-8k start 82660000 pointer offset 1408 size 8192 > Register r6 information: NULL pointer > Register r7 information: slab kmalloc-8k start 82660000 pointer offset 5504 size 8192 > Register r8 information: NULL pointer > Register r9 information: slab kmalloc-1k start 838f0400 pointer offset 226 size 1024 > Register r10 information: non-paged memory > Register r11 information: slab kmalloc-1k start 838f0400 pointer offset 226 size 1024 > Register r12 information: NULL pointer > Process ksoftirqd/0 (pid: 10, stack limit = 0x1fff5f96) > Stack: (0xf0801e60 to 0xf0802000) > 1e60: 8393cd80 7f136538 00000000 81590f34 80f050b4 20000193 f0801ecc 7f189a7c > 1e80: 00000032 00000005 823f0458 f0801f18 81c51a00 8368504c 7f189854 83898000 > 1ea0: 8226ac40 40000210 00000200 80f04ec8 f17ddddc 00000000 f0801f18 82660580 > 1ec0: 8393cd80 00000000 00000000 8393cd98 838f04e2 7f13791c 00000000 00000000 > 1ee0: 82660580 00004288 00000000 838f04e2 82660580 8393cd98 82660580 838f04e2 > 1f00: 82660a8c 7f1906b0 7f190708 00000000 40000006 7f137d18 8368578c 8393cd98 > 1f20: 8393cd80 00000000 00000000 00000000 00000000 00000000 82660a8c 80f04ec8 > 1f40: 8393cd80 82660580 82660a7c 7f1347f8 00000000 80f04ec8 00000001 82660a64 > 1f60: 00000000 eefad338 00000000 00000006 80be7f14 801246f8 00000006 80f03098 > 1f80: 80f03080 81504c80 00000101 8010140c f0861e78 80915818 8225e100 f0801f90 > 1fa0: 80f03080 80e543c0 80c059f4 0000000a 80e56a40 80e56a40 80e54334 80c284f4 > 1fc0: 00005a10 80f03d40 80a01e20 04208040 80c059f4 80e56a40 20000013 ffffffff > 1fe0: f0861eb4 81504c80 81504c80 80f050b4 f0861e78 801245ac 80144024 804772fc > kfree_skb_list_reason from ieee80211_report_used_skb+0xd0/0x5b4 [mac80211] > ieee80211_report_used_skb [mac80211] from ieee80211_tx_status_ext+0x4c8/0x850 [mac80211] > ieee80211_tx_status_ext [mac80211] from ieee80211_tx_status+0x74/0x9c [mac80211] > ieee80211_tx_status [mac80211] from ieee80211_tasklet_handler+0xb0/0xd8 [mac80211] > ieee80211_tasklet_handler [mac80211] from tasklet_action_common.constprop.0+0xb0/0xc4 > tasklet_action_common.constprop.0 from __do_softirq+0x14c/0x2c0 > __do_softirq from irq_exit+0x98/0xc8 > irq_exit from call_with_stack+0x18/0x20 > call_with_stack from __irq_svc+0x98/0xc8 > Exception stack(0xf0861e80 to 0xf0861ec8) > 1e80: 00000001 00000002 00000001 81504c80 eefafdc0 00000000 81590880 00000000 > 1ea0: 81504c80 81505248 80f050b4 f0861f14 f0861f18 f0861ed0 80915bec 80144024 > 1ec0: 20000013 ffffffff > __irq_svc from finish_task_switch+0xa8/0x270 > finish_task_switch from __schedule+0x25c/0x628 > __schedule from schedule+0x5c/0xb4 > schedule from smpboot_thread_fn+0xbc/0x23c > smpboot_thread_fn from kthread+0xf4/0x124 > kthread from ret_from_fork+0x14/0x2c > Exception stack(0xf0861fb0 to 0xf0861ff8) > 1fa0: 00000000 00000000 00000000 00000000 > 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > Code: e92d4010 e2504000 08bd8010 e1a00004 (e5944000) > [ 5] 24.00-25.00 sec 765 KBy---[ end trace 0000000000000000 ]--- > tes 6.27 Mbits/sec Kernel panic - not syncing: Fatal exception in interrupt > CPU2: stopping > CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1 > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x40/0x4c > dump_stack_lvl from do_handle_IPI+0x100/0x128 > do_handle_IPI from ipi_handler+0x18/0x20 > ipi_handler from handle_percpu_devid_irq+0x8c/0x138 > handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34 > generic_handle_domain_irq from gic_handle_irq+0x74/0x88 > gic_handle_irq from generic_handle_arch_irq+0x58/0x78 > generic_handle_arch_irq from call_with_stack+0x18/0x20 > call_with_stack from __irq_svc+0x98/0xc8 > Exception stack(0xf0871f10 to 0xf0871f58) > 1f00: 00000002 80bf66e8 00000001 6e16f000 > 1f20: 00000000 80f0a668 00000000 00000000 a05c2adc a0629de7 eefc50c8 0000007b > 1f40: fffffff5 f0871f60 80155d84 807006d8 60030013 ffffffff > __irq_svc from cpuidle_enter_state+0x158/0x358 > cpuidle_enter_state from cpuidle_enter+0x40/0x50 > cpuidle_enter from do_idle+0x19c/0x208 > do_idle from cpu_startup_entry+0x18/0x1c > cpu_startup_entry from secondary_start_kernel+0x148/0x150 > secondary_start_kernel from 0x10101620 > CPU3: stopping > CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D 6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1 > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x40/0x4c > dump_stack_lvl from do_handle_IPI+0x100/0x128 > do_handle_IPI from ipi_handler+0x18/0x20 > ipi_handler from handle_percpu_devid_irq+0x8c/0x138 > handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34 > generic_handle_domain_irq from gic_handle_irq+0x74/0x88 > gic_handle_irq from generic_handle_arch_irq+0x58/0x78 > generic_handle_arch_irq from call_with_stack+0x18/0x20 > call_with_stack from __irq_svc+0x98/0xc8 > Exception stack(0xf0875f10 to 0xf0875f58) > 5f00: 00000003 80bf66e8 00000001 6e17a000 > 5f20: 00000000 80f0a668 00000000 00000000 a05c5ef1 a0629de7 eefd00c8 0000007b > 5f40: fffffff5 f0875f60 80155d84 807006d8 60000013 ffffffff > __irq_svc from cpuidle_enter_state+0x158/0x358 > cpuidle_enter_state from cpuidle_enter+0x40/0x50 > cpuidle_enter from do_idle+0x19c/0x208 > do_idle from cpu_startup_entry+0x18/0x1c > cpu_startup_entry from secondary_start_kernel+0x148/0x150 > secondary_start_kernel from 0x10101620 > CPU1: stopping > CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D 6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1 > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x40/0x4c > dump_stack_lvl from do_handle_IPI+0x100/0x128 > do_handle_IPI from ipi_handler+0x18/0x20 > ipi_handler from handle_percpu_devid_irq+0x8c/0x138 > handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34 > generic_handle_domain_irq from gic_handle_irq+0x74/0x88 > gic_handle_irq from generic_handle_arch_irq+0x58/0x78 > generic_handle_arch_irq from call_with_stack+0x18/0x20 > call_with_stack from __irq_svc+0x98/0xc8 > Exception stack(0xf086df10 to 0xf086df58) > df00: 00000001 80bf66e8 00000001 6e164000 > df20: 00000000 80f0a668 00000000 00000000 a05c2d77 a0629de7 eefba0c8 0000007b > df40: fffffff5 f086df60 80155d84 807006d8 600e0013 ffffffff > __irq_svc from cpuidle_enter_state+0x158/0x358 > cpuidle_enter_state from cpuidle_enter+0x40/0x50 > cpuidle_enter from do_idle+0x19c/0x208 > do_idle from cpu_startup_entry+0x18/0x1c > cpu_startup_entry from secondary_start_kernel+0x148/0x150 > secondary_start_kernel from 0x10101620 > > However, the corruption can manifest itself in different ways as well - > - sometimes even damaging contents of onboard NAND flash. > Similar traces have appeared previously in other places as well. > In addition to testing on 6.0-rc5, we tried cherry-picking 047dc4cf9a10b4f2dc164b8bf192de583f3ebfee > from wireless-next as well, but this seems unrelated to the issue on first glance, > and doesn't prevent crashes. > > I post relevant bits of device tree we used to get the module to work below. > We're using in-band IRQ of the SDIO interface: > > / { > wfx_pwrseq: wfx_pwrseq { > compatible = "mmc-pwrseq-simple"; > pinctrl-names = "default"; > pinctrl-0 = <&pinctrl_wfx_reset>; > reset-gpios = <&gpio7 8 GPIO_ACTIVE_LOW>; > }; > }; > > &iomuxc { > usdhc1 { > pinctrl_usdhc1_3: usdhc1grp-3 { > fsl,pins = < > MX6QDL_PAD_SD1_CMD__SD1_CMD 0x17059 > MX6QDL_PAD_SD1_CLK__SD1_CLK 0x10059 > MX6QDL_PAD_SD1_DAT0__SD1_DATA0 0x17059 > MX6QDL_PAD_SD1_DAT1__SD1_DATA1 0x17059 > MX6QDL_PAD_SD1_DAT2__SD1_DATA2 0x17059 > MX6QDL_PAD_SD1_DAT3__SD1_DATA3 0x17059 > MX6QDL_PAD_SD3_CLK__GPIO7_IO03 0x17041 > MX6QDL_PAD_SD3_CMD__GPIO7_IO02 0x13019 > >; > }; > > pinctrl_wfx_reset: wfx-reset-grp { > fsl,pins = < > MX6QDL_PAD_SD3_RST__GPIO7_IO08 0x1B030 > >; > }; > }; > }; > > &usdhc1 { > status = "okay"; > #address-cells = <1>; > #size-cells = <0>; > pinctrl-names = "default"; > pinctrl-0 = <&pinctrl_usdhc1_3>; > cap-power-off-card; > keep-power-in-suspend; > cap-sdio-irq; > wakeup-source; > disable-wp; > cap-sd-highspeed; > bus-width = <4>; > non-removable; > no-mmc; > no-sd; > mmc-pwrseq = <&wfx_pwrseq>; > wifi@1 { > compatible = "silabs,brd8023a"; > reg = <1>; > wakeup-gpios = <&gpio7 2 GPIO_ACTIVE_HIGH>; > }; > }; > > With that, the device probes successfully, and we can get 22Mbps of traffic with a 1T1R peer > in HT20 mode in both directions. > SDIO singals were checked with the oscilloscope, and they look perfectly fine, > so I think we can rule out any hardware issue. > > By adding a canary to slab allocator, we managed to find, that the skb structures gets damaged, > and then improperly dereferenced by the driver somewhere in TX queue handling code. > > With SMP disabled, the issue still manifests itself, hinting at synchronization issue > between the interrupt context, and the tasklets handling the bulk of work. > However, it usually takes a longer time to reproduce - still in order of a few minutes. > In some cases the kernel would detect use-after-free by itself - without modification - > or the reference counts get corrupted. > > This stacktrace comes from one of the runs with CONFIG_SMP disabled: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 10 at lib/refcount.c:28 ieee80211_tx_status_ext+0x4f8/0x968 [mac80211] > refcount_t: underflow; use-after-free. > Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables cdc_mbim cdc_wdm cdc_ncm > cdc_ether usbnet cdc_acm usb_serial_simple usbserial usb_f_rndis u_ether wfx mac80211 libarc4 evbug > phy_generic cfg80211 adt7475 hwmon_vid ci_hdrc_imx ci_hdrc ulpi roles usbmisc_imx pwm_imx27 > pwm_beeper libcomposite configfs udc_core > CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G W 5.19.2+ge4fb6643395f #1 > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x24/0x2c > dump_stack_lvl from __warn+0xb0/0xd8 > __warn from warn_slowpath_fmt+0x98/0xc8 > warn_slowpath_fmt from ieee80211_tx_status_ext+0x4f8/0x968 [mac80211] > ieee80211_tx_status_ext [mac80211] from ieee80211_tx_status+0x74/0x9c [mac80211] > ieee80211_tx_status [mac80211] from ieee80211_tasklet_handler+0xb0/0xd8 [mac80211] > ieee80211_tasklet_handler [mac80211] from tasklet_action_common.constprop.0+0xb4/0xc0 > tasklet_action_common.constprop.0 from __do_softirq+0x12c/0x290 > __do_softirq from irq_exit+0x90/0xbc > irq_exit from call_with_stack+0x18/0x20 > call_with_stack from __irq_svc+0x94/0xc4 > Exception stack(0xf0859e98 to 0xf0859ee0) > 9e80: 00000001 81080780 > 9ea0: 00000001 81080780 00000000 00000002 822f0780 808e82cc 81080780 81080c50 > 9ec0: 00000000 f0859f14 f0859f18 f0859ee8 801404f0 80140624 20000013 ffffffff > __irq_svc from finish_task_switch+0x78/0x1f8 > finish_task_switch from __schedule+0x244/0x580 > __schedule from schedule+0x5c/0xb4 > schedule from smpboot_thread_fn+0xb8/0x224 > smpboot_thread_fn from kthread+0xe4/0x114 > kthread from ret_from_fork+0x14/0x2c > Exception stack(0xf0859fb0 to 0xf0859ff8) > 9fa0: 00000000 00000000 00000000 00000000 > 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > ---[ end trace 0000000000000000 ]--- > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 1131 at lib/refcount.c:22 __tcp_transmit_skb+0x7a4/0xa8c > > refcount_t: saturated; leaking memory. > Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables cdc_mbim cdc_wdm cdc_ncm > cdc_ether usbnet cdc_acm usb_serial_simple usbserial usb_f_rndis u_ether wfx mac80211 libarc4 evbug > phy_generic cfg80211 adt7475 hwmon_vid ci_hdrc_imx ci_hdrc ulpi roles usbmisc_imx pwm_imx27 > pwm_beeper libcomposite configfs udc_core > CPU: 0 PID: 1131 Comm: kworker/0:2H Tainted: G W 5.19.2+ge4fb6643395f #1 > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > Workqueue: wfx_bh_wq bh_work [wfx] > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x24/0x2c > dump_stack_lvl from __warn+0xb0/0xd8 > __warn from warn_slowpath_fmt+0x98/0xc8 > warn_slowpath_fmt from __tcp_transmit_skb+0x7a4/0xa8c > __tcp_transmit_skb from __tcp_send_ack.part.0+0xd0/0x13c > __tcp_send_ack.part.0 from tcp_delack_timer_handler+0xb0/0x180 > tcp_delack_timer_handler from tcp_delack_timer+0x2c/0x128 > tcp_delack_timer from call_timer_fn.constprop.0+0x18/0x80 > call_timer_fn.constprop.0 from run_timer_softirq+0x2ec/0x3b0 > run_timer_softirq from __do_softirq+0x12c/0x290 > __do_softirq from call_with_stack+0x18/0x20 > call_with_stack from do_softirq+0x6c/0x70 > do_softirq from __local_bh_enable_ip+0xd8/0xdc > __local_bh_enable_ip from __netdev_alloc_skb+0x14c/0x170 > __netdev_alloc_skb from bh_work+0x1b0/0x650 [wfx] > bh_work [wfx] from process_one_work+0x1b8/0x3ec > process_one_work from worker_thread+0x4c/0x57c > worker_thread from kthread+0xe4/0x114 > kthread from ret_from_fork+0x14/0x2c > Exception stack(0xf161dfb0 to 0xf161dff8) > dfa0: 00000000 00000000 00000000 00000000 > dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 > ---[ end trace 0000000000000000 ]--- > [ 5] 536.16-537.00 sec 26.9 KBytes 261 Kbits/sec > [ 5] 537.00-538.00 sec 2.71 MBytes 22.7 Mbits/sec > 8<--- cut here --- > Unable to handle kernel NULL pointer dereference at virtual address 0000011c > [0000011c] *pgd=00000000 > Internal error: Oops: 5 [#1] PREEMPT ARM > Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables > cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet cdc_acm usb_serial_simple usbserial > usb_f_rndis u_ether wfx mac80211 libarc4 evbug phy_generic cfg80211 adt7475 > hwmon_vid ci_hdrc_imx ci_hdrc ulpi roles usbmisc_imx pwm_imx27 pwm_beeper > libcomposite configfs udc_core > CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G W 5.19.2+ge4fb6643395f #1 > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > PC is at ip6_rcv_core+0x110/0x68c > LR is at ip6_rcv_core+0xb0/0x68c > pc : [<8084d278>] lr : [<8084d218>] psr: 20000013 > sp : f0859e18 ip : 00000000 fp : 80e13cc0 > r10: 00000000 r9 : 80e13cf4 r8 : 81b65000 > r7 : 80e6d7c8 r6 : 82024c00 r5 : 812a8760 r4 : 81be5b40 > r3 : 00000000 r2 : 00000100 r1 : 000000d7 r0 : 00000000 > Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > Control: 10c53c7d Table: 12338059 DAC: 00000051 > Register r0 information: NULL pointer > Register r1 information: non-paged memory > Register r2 information: non-paged memory > Register r3 information: NULL pointer > Register r4 information: slab skbuff_head_cache start 81be5b40 pointer offset 0 size 48 > Register r5 information: non-slab/vmalloc memory > Register r6 information: slab kmalloc-1k start 82024c00 pointer offset 0 size 1024 > Register r7 information: non-slab/vmalloc memory > Register r8 information: slab kmalloc-2k start 81b65000 pointer offset 0 size 2048 > Register r9 information: non-slab/vmalloc memory > Register r10 information: NULL pointer > Register r11 information: non-slab/vmalloc memory > Register r12 information: NULL pointer > Process ksoftirqd/0 (pid: 10, stack limit = 0x7cac7060) > Stack: (0xf0859e18 to 0xf085a000) > 9e00: 81b65000 80e13d00 > 9e20: 80e6d7c8 80e13cc8 00000040 80e13cf4 00000000 8084da90 80d0ce80 80d0424c > 9e40: 80d0ce80 81b65000 80e13d00 00000001 80e13cc8 80d0424c 8084da60 80e13d00 > 9e60: 00000001 807691c0 00000001 81be5b40 80d06654 80d0424c 81be5b40 80769348 > 9e80: 00000001 80e13d00 00000040 f0859ecb 80dd6000 00008b6a f0859ed4 80769ec4 > 9ea0: 00000001 81080780 00000000 80e13d00 0000012c 00000000 f0859ecc 8076a2d8 > 9ec0: 00008b6c 81080780 00859f18 f0859ecc f0859ecc f0859ed4 f0859ed4 80d0424c > 9ee0: 00000051 00000000 00000003 80e15834 80e15828 81080780 00000100 80adb4e4 > 9f00: 40000003 801013f4 821d9540 00000000 f0859f5c 80e15828 80d0d390 80e13c80 > 9f20: 80af6e3c 0000000a 80d0b588 80b19518 00008b6b 80dd6000 04208040 80901dd0 > 9f40: 81080780 00000000 8102de00 81080780 80d0b558 00000001 00000001 00000000 > 9f60: 00000000 80120a18 00000000 8013e590 8102de40 8102df00 8013e42c 8102de00 > 9f80: 81080780 f0835e30 00000000 8013a85c 8102de40 8013a778 00000000 00000000 > 9fa0: 00000000 00000000 00000000 80100148 00000000 00000000 00000000 00000000 > 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 > ip6_rcv_core from ipv6_rcv+0x30/0xd4 > ipv6_rcv from __netif_receive_skb_one_core+0x5c/0x80 > __netif_receive_skb_one_core from process_backlog+0x70/0xe4 > process_backlog from __napi_poll+0x2c/0x1f0 > __napi_poll from net_rx_action+0x140/0x264 > net_rx_action from __do_softirq+0x12c/0x290 > __do_softirq from run_ksoftirqd+0x34/0x3c > run_ksoftirqd from smpboot_thread_fn+0x164/0x224 > smpboot_thread_fn from kthread+0xe4/0x114 > kthread from ret_from_fork+0x14/0x2c > Exception stack(0xf0859fb0 to 0xf0859ff8) > 9fa0: 00000000 00000000 00000000 00000000 > 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > Code: e5843024 e5843028 e584302c 0a000055 (e1d231bc) > ---[ end trace 0000000000000000 ]--- > Kernel panic - not syncing: Fatal exception in interrupt > > Now, the questions: > - Is "silabs,brd8023a" the proper compatible string for WFM200S022XNN3, or should we create our > own for the bare module, even if just the in-band SDIO IRQ, and an external antenna is in use? > - In order to try out the out-of-band IRQ - in hope that it resolves the issue somehow - do we need to create custom PDS file? > With the IRQ enabled, probe fails with "Chip did not answer" error. > - Tracing memory corruptions is hard - is there a mechanism that could help us out better than generic methods like kprobes, > or implementing canaries? As skb's are heavily re-used for performance reasons, tracing their lifecycle is especially hard. > Our first idea was to lock their respective pages from writing, once they are enqueued in the wfx TX queue, > so MMU detects the corruption at the exact time it happens, but we haven't figure out how to modify skb allocator to accomplish that, > especially given that the issue mostly happens when transmitting, so skbs are allocated outside of the driver. > Maybe there exists a similar mechanism - that could help us out - even if just in the works? > > Any help will be greatly appreciated - we'll be very happy to provide a patch if we manage to figure the issue out. > > > W dniu 12.09.2022 o 18:15, Jérôme Pouiller pisze: >> On Monday 12 September 2022 17:16:24 CEST Lech Perczak wrote: >>> Hello, >>> >>> We're trying to get a WFM200S022XNN3 module working on a custom i.MX6Q board using SDIO interface, using upstream kernel. Our patches concern primarily the device tree for the board - and upstream firmware from linux-firmware repository. >>> >>> During that, we stumbled upon a memory corruption issue, which appears when big traffic is passing through the device. Our adapter is running in AP mode. This can be reproduced with 100% rate using iperf3, by starting an AP interface on the device, and an iperf3 server. Then, the client station runs iperf3 with "iperf3 -c <hostname> -t 3600" command - so the AP is sending data for up to one hour, however - the kernel on our device crashes after around a few minutes of traffic, sometimes less than a minute. >>> >>> The behaviour is the same on kernel v5.19.7, v5.19.2, and even with v6.0-rc5. Tests on v6.0-rc5 have shown most detailed stacktrace so far: >>> >> Hello Lech, >> >> It seems that something somewhere (Ms Exchange, I am looking at you) has >> removed all the newlines of your mail :-/. Can you try to fix the problem? >> I think that sending mails using base64 encoding would solve the issue. >> >> >> [...] >> >> -- >> Jérôme Pouiller -- Pozdrawiam/With kind regards, Lech Perczak Sr. Software Engineer Camlin Technologies Poland Limited Sp. z o.o. Strzegomska 54, 53-611 Wroclaw Tel: (+48) 71 75 000 16 Email: lech.perczak@xxxxxxxxxxxxxxx Website: http://www.camlingroup.com