Search Linux Wireless

Re: wfx: Memory corruption during high traffic with WFM200 on i.MX6Q platform

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jérôme,

Probably a Thunderbird mess-up. Let's try again, I hope it works - I probably fiddled too much with the settings to make it send plain-text.

We're trying to get a WFM200S022XNN3 module working on a custom i.MX6Q board using SDIO interface, using upstream kernel.
Our patches concern primarily the device tree for the board - and upstream firmware from linux-firmware repository.

During that, we stumbled upon a memory corruption issue, which appears when big traffic is passing through the device.
Our adapter is running in AP mode. This can be reproduced with 100% rate using iperf3,
by starting an AP interface on the device, and an iperf3 server.
Then, the client station runs iperf3 with "iperf3 -c <hostname> -t 3600" command - so the AP is sending data for up to one hour,
however - the kernel on our device crashes after around a few minutes of traffic, sometimes less than a minute.

The behaviour is the same on kernel v5.19.7, v5.19.2, and even with v6.0-rc5. Tests on v6.0-rc5 have shown most detailed stacktrace so far:

8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 00000101
[00000101] *pgd=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables cdc_mbim cdc_wdm cdc_ncm
cdc_ether usbnet cdc_acm usb_serial_simple usbserial usb_f_rndis u_ether wfx mac80211 libarc4 cfg80211 evbug
phy_generic ci_hdrc_imx ci_hdrc adt7475 hwmon_vid ulpi roles usbmisc_imx pwm_imx27
pwm_beeper libcomposite configfs udc_core
CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
PC is at kfree_skb_list_reason+0x10/0x24
LR is at ieee80211_report_used_skb+0xd0/0x5b4 [mac80211]
pc : [<80773238>]    lr : [<7f136538>]    psr: 20000113
sp : f0801e60  ip : 00000000  fp : 838f04e2
r10: 00000001  r9 : 838f04e2  r8 : 00000000
r7 : 82661580  r6 : 00000000  r5 : 82660580  r4 : 00000101
r3 : 838f0700  r2 : 00000032  r1 : 00000001  r0 : 00000101
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 11d0004a  DAC: 00000051
Register r0 information: non-paged memory
Register r1 information: non-paged memory
Register r2 information: non-paged memory
Register r3 information: slab kmalloc-1k start 838f0400 pointer offset 768 size 1024
Register r4 information: non-paged memory
Register r5 information: slab kmalloc-8k start 82660000 pointer offset 1408 size 8192
Register r6 information: NULL pointer
Register r7 information: slab kmalloc-8k start 82660000 pointer offset 5504 size 8192
Register r8 information: NULL pointer
Register r9 information: slab kmalloc-1k start 838f0400 pointer offset 226 size 1024
Register r10 information: non-paged memory
Register r11 information: slab kmalloc-1k start 838f0400 pointer offset 226 size 1024
Register r12 information: NULL pointer
Process ksoftirqd/0 (pid: 10, stack limit = 0x1fff5f96)
Stack: (0xf0801e60 to 0xf0802000)
1e60: 8393cd80 7f136538 00000000 81590f34 80f050b4 20000193 f0801ecc 7f189a7c
1e80: 00000032 00000005 823f0458 f0801f18 81c51a00 8368504c 7f189854 83898000
1ea0: 8226ac40 40000210 00000200 80f04ec8 f17ddddc 00000000 f0801f18 82660580
1ec0: 8393cd80 00000000 00000000 8393cd98 838f04e2 7f13791c 00000000 00000000
1ee0: 82660580 00004288 00000000 838f04e2 82660580 8393cd98 82660580 838f04e2
1f00: 82660a8c 7f1906b0 7f190708 00000000 40000006 7f137d18 8368578c 8393cd98
1f20: 8393cd80 00000000 00000000 00000000 00000000 00000000 82660a8c 80f04ec8
1f40: 8393cd80 82660580 82660a7c 7f1347f8 00000000 80f04ec8 00000001 82660a64
1f60: 00000000 eefad338 00000000 00000006 80be7f14 801246f8 00000006 80f03098
1f80: 80f03080 81504c80 00000101 8010140c f0861e78 80915818 8225e100 f0801f90
1fa0: 80f03080 80e543c0 80c059f4 0000000a 80e56a40 80e56a40 80e54334 80c284f4
1fc0: 00005a10 80f03d40 80a01e20 04208040 80c059f4 80e56a40 20000013 ffffffff
1fe0: f0861eb4 81504c80 81504c80 80f050b4 f0861e78 801245ac 80144024 804772fc
kfree_skb_list_reason from ieee80211_report_used_skb+0xd0/0x5b4 [mac80211]
ieee80211_report_used_skb [mac80211] from ieee80211_tx_status_ext+0x4c8/0x850 [mac80211]
ieee80211_tx_status_ext [mac80211] from ieee80211_tx_status+0x74/0x9c [mac80211]
ieee80211_tx_status [mac80211] from ieee80211_tasklet_handler+0xb0/0xd8 [mac80211]
ieee80211_tasklet_handler [mac80211] from tasklet_action_common.constprop.0+0xb0/0xc4
tasklet_action_common.constprop.0 from __do_softirq+0x14c/0x2c0
__do_softirq from irq_exit+0x98/0xc8
irq_exit from call_with_stack+0x18/0x20
call_with_stack from __irq_svc+0x98/0xc8
Exception stack(0xf0861e80 to 0xf0861ec8)
1e80: 00000001 00000002 00000001 81504c80 eefafdc0 00000000 81590880 00000000
1ea0: 81504c80 81505248 80f050b4 f0861f14 f0861f18 f0861ed0 80915bec 80144024
1ec0: 20000013 ffffffff
__irq_svc from finish_task_switch+0xa8/0x270
finish_task_switch from __schedule+0x25c/0x628
__schedule from schedule+0x5c/0xb4
schedule from smpboot_thread_fn+0xbc/0x23c
smpboot_thread_fn from kthread+0xf4/0x124
kthread from ret_from_fork+0x14/0x2c
Exception stack(0xf0861fb0 to 0xf0861ff8)
1fa0:                                     00000000 00000000 00000000 00000000
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Code: e92d4010 e2504000 08bd8010 e1a00004 (e5944000)  
[  5]  24.00-25.00  sec   765 KBy---[ end trace 0000000000000000 ]---
tes  6.27 Mbits/sec              Kernel panic - not syncing: Fatal exception in interrupt
CPU2: stopping
CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D            6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x40/0x4c
dump_stack_lvl from do_handle_IPI+0x100/0x128
do_handle_IPI from ipi_handler+0x18/0x20
ipi_handler from handle_percpu_devid_irq+0x8c/0x138
handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34
generic_handle_domain_irq from gic_handle_irq+0x74/0x88
gic_handle_irq from generic_handle_arch_irq+0x58/0x78
generic_handle_arch_irq from call_with_stack+0x18/0x20
call_with_stack from __irq_svc+0x98/0xc8
Exception stack(0xf0871f10 to 0xf0871f58)
1f00:                                     00000002 80bf66e8 00000001 6e16f000
1f20: 00000000 80f0a668 00000000 00000000 a05c2adc a0629de7 eefc50c8 0000007b
1f40: fffffff5 f0871f60 80155d84 807006d8 60030013 ffffffff
__irq_svc from cpuidle_enter_state+0x158/0x358
cpuidle_enter_state from cpuidle_enter+0x40/0x50
cpuidle_enter from do_idle+0x19c/0x208
do_idle from cpu_startup_entry+0x18/0x1c
cpu_startup_entry from secondary_start_kernel+0x148/0x150
secondary_start_kernel from 0x10101620
CPU3: stopping
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D            6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x40/0x4c
dump_stack_lvl from do_handle_IPI+0x100/0x128
do_handle_IPI from ipi_handler+0x18/0x20
ipi_handler from handle_percpu_devid_irq+0x8c/0x138
handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34
generic_handle_domain_irq from gic_handle_irq+0x74/0x88
gic_handle_irq from generic_handle_arch_irq+0x58/0x78
generic_handle_arch_irq from call_with_stack+0x18/0x20
call_with_stack from __irq_svc+0x98/0xc8
Exception stack(0xf0875f10 to 0xf0875f58)
5f00:                                     00000003 80bf66e8 00000001 6e17a000
5f20: 00000000 80f0a668 00000000 00000000 a05c5ef1 a0629de7 eefd00c8 0000007b
5f40: fffffff5 f0875f60 80155d84 807006d8 60000013 ffffffff
__irq_svc from cpuidle_enter_state+0x158/0x358
cpuidle_enter_state from cpuidle_enter+0x40/0x50
cpuidle_enter from do_idle+0x19c/0x208
do_idle from cpu_startup_entry+0x18/0x1c
cpu_startup_entry from secondary_start_kernel+0x148/0x150
secondary_start_kernel from 0x10101620
CPU1: stopping
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D            6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x40/0x4c
dump_stack_lvl from do_handle_IPI+0x100/0x128
do_handle_IPI from ipi_handler+0x18/0x20
ipi_handler from handle_percpu_devid_irq+0x8c/0x138
handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34
generic_handle_domain_irq from gic_handle_irq+0x74/0x88
gic_handle_irq from generic_handle_arch_irq+0x58/0x78
generic_handle_arch_irq from call_with_stack+0x18/0x20
call_with_stack from __irq_svc+0x98/0xc8
Exception stack(0xf086df10 to 0xf086df58)
df00:                                     00000001 80bf66e8 00000001 6e164000
df20: 00000000 80f0a668 00000000 00000000 a05c2d77 a0629de7 eefba0c8 0000007b
df40: fffffff5 f086df60 80155d84 807006d8 600e0013 ffffffff
__irq_svc from cpuidle_enter_state+0x158/0x358
cpuidle_enter_state from cpuidle_enter+0x40/0x50
cpuidle_enter from do_idle+0x19c/0x208
do_idle from cpu_startup_entry+0x18/0x1c
cpu_startup_entry from secondary_start_kernel+0x148/0x150
secondary_start_kernel from 0x10101620

However, the corruption can manifest itself in different ways as well -
- sometimes even damaging contents of onboard NAND flash.
Similar traces have appeared previously in other places as well.
In addition to testing on 6.0-rc5, we tried cherry-picking 047dc4cf9a10b4f2dc164b8bf192de583f3ebfee
from wireless-next as well, but this seems unrelated to the issue on first glance,
and doesn't prevent crashes.

I post relevant bits of device tree we used to get the module to work below.
We're using in-band IRQ of the SDIO interface:

/ {
         wfx_pwrseq: wfx_pwrseq {
                 compatible = "mmc-pwrseq-simple";
                 pinctrl-names = "default";
                 pinctrl-0 = <&pinctrl_wfx_reset>;
                 reset-gpios = <&gpio7 8 GPIO_ACTIVE_LOW>;
         };
 };

&iomuxc {
         usdhc1 {
                 pinctrl_usdhc1_3: usdhc1grp-3 {
                         fsl,pins = <
                                 MX6QDL_PAD_SD1_CMD__SD1_CMD    0x17059
                                 MX6QDL_PAD_SD1_CLK__SD1_CLK    0x10059
                                 MX6QDL_PAD_SD1_DAT0__SD1_DATA0 0x17059
                                 MX6QDL_PAD_SD1_DAT1__SD1_DATA1 0x17059
                                 MX6QDL_PAD_SD1_DAT2__SD1_DATA2 0x17059
                                 MX6QDL_PAD_SD1_DAT3__SD1_DATA3 0x17059
                                 MX6QDL_PAD_SD3_CLK__GPIO7_IO03 0x17041
                                 MX6QDL_PAD_SD3_CMD__GPIO7_IO02 0x13019
                         >;
                 };

                 pinctrl_wfx_reset: wfx-reset-grp {
                         fsl,pins = <
                                 MX6QDL_PAD_SD3_RST__GPIO7_IO08 0x1B030
                         >;
                 };
         };
};

&usdhc1 {
         status = "okay";
         #address-cells = <1>;
         #size-cells = <0>;
         pinctrl-names = "default";
         pinctrl-0 = <&pinctrl_usdhc1_3>;
         cap-power-off-card;
         keep-power-in-suspend;
         cap-sdio-irq;
         wakeup-source;
         disable-wp;
         cap-sd-highspeed;
         bus-width = <4>;
         non-removable;
         no-mmc;
         no-sd;
         mmc-pwrseq = <&wfx_pwrseq>;
         wifi@1 {
                 compatible = "silabs,brd8023a";
                 reg = <1>;
                 wakeup-gpios = <&gpio7 2 GPIO_ACTIVE_HIGH>;
         };
};

With that, the device probes successfully, and we can get 22Mbps of traffic with a 1T1R peer
in HT20 mode in both directions.
SDIO singals were checked with the oscilloscope, and they look perfectly fine,
so I think we can rule out any hardware issue.

By adding a canary to slab allocator, we managed to find, that the skb structures gets damaged,
and then improperly dereferenced by the driver somewhere in TX queue handling code.

With SMP disabled, the issue still manifests itself, hinting at synchronization issue
between the interrupt context, and the tasklets handling the bulk of work.
However, it usually takes a longer time to reproduce - still in order of a few minutes.
In some cases the kernel would detect use-after-free by itself - without modification -
or the reference counts get corrupted.

This stacktrace comes from one of the runs with CONFIG_SMP disabled:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 10 at lib/refcount.c:28 ieee80211_tx_status_ext+0x4f8/0x968 [mac80211]
refcount_t: underflow; use-after-free.
Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables cdc_mbim cdc_wdm cdc_ncm
cdc_ether usbnet cdc_acm usb_serial_simple usbserial usb_f_rndis u_ether wfx mac80211 libarc4 evbug
phy_generic cfg80211 adt7475 hwmon_vid ci_hdrc_imx ci_hdrc ulpi roles usbmisc_imx pwm_imx27
pwm_beeper libcomposite configfs udc_core
CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G        W         5.19.2+ge4fb6643395f #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x24/0x2c
dump_stack_lvl from __warn+0xb0/0xd8
__warn from warn_slowpath_fmt+0x98/0xc8
warn_slowpath_fmt from ieee80211_tx_status_ext+0x4f8/0x968 [mac80211]
ieee80211_tx_status_ext [mac80211] from ieee80211_tx_status+0x74/0x9c [mac80211]
ieee80211_tx_status [mac80211] from ieee80211_tasklet_handler+0xb0/0xd8 [mac80211]
ieee80211_tasklet_handler [mac80211] from tasklet_action_common.constprop.0+0xb4/0xc0
tasklet_action_common.constprop.0 from __do_softirq+0x12c/0x290
__do_softirq from irq_exit+0x90/0xbc
irq_exit from call_with_stack+0x18/0x20
call_with_stack from __irq_svc+0x94/0xc4
Exception stack(0xf0859e98 to 0xf0859ee0)
9e80:                                                       00000001 81080780
9ea0: 00000001 81080780 00000000 00000002 822f0780 808e82cc 81080780 81080c50
9ec0: 00000000 f0859f14 f0859f18 f0859ee8 801404f0 80140624 20000013 ffffffff
__irq_svc from finish_task_switch+0x78/0x1f8
finish_task_switch from __schedule+0x244/0x580
__schedule from schedule+0x5c/0xb4
schedule from smpboot_thread_fn+0xb8/0x224
smpboot_thread_fn from kthread+0xe4/0x114
kthread from ret_from_fork+0x14/0x2c
Exception stack(0xf0859fb0 to 0xf0859ff8)
9fa0:                                     00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1131 at lib/refcount.c:22 __tcp_transmit_skb+0x7a4/0xa8c
    
refcount_t: saturated; leaking memory.
Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables cdc_mbim cdc_wdm cdc_ncm
cdc_ether usbnet cdc_acm usb_serial_simple usbserial usb_f_rndis u_ether wfx mac80211 libarc4 evbug
phy_generic cfg80211 adt7475 hwmon_vid ci_hdrc_imx ci_hdrc ulpi roles usbmisc_imx pwm_imx27
pwm_beeper libcomposite configfs udc_core
CPU: 0 PID: 1131 Comm: kworker/0:2H Tainted: G        W         5.19.2+ge4fb6643395f #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: wfx_bh_wq bh_work [wfx]
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x24/0x2c
dump_stack_lvl from __warn+0xb0/0xd8
__warn from warn_slowpath_fmt+0x98/0xc8
warn_slowpath_fmt from __tcp_transmit_skb+0x7a4/0xa8c
__tcp_transmit_skb from __tcp_send_ack.part.0+0xd0/0x13c
__tcp_send_ack.part.0 from tcp_delack_timer_handler+0xb0/0x180
tcp_delack_timer_handler from tcp_delack_timer+0x2c/0x128
tcp_delack_timer from call_timer_fn.constprop.0+0x18/0x80
call_timer_fn.constprop.0 from run_timer_softirq+0x2ec/0x3b0
run_timer_softirq from __do_softirq+0x12c/0x290
__do_softirq from call_with_stack+0x18/0x20
call_with_stack from do_softirq+0x6c/0x70
do_softirq from __local_bh_enable_ip+0xd8/0xdc
__local_bh_enable_ip from __netdev_alloc_skb+0x14c/0x170
__netdev_alloc_skb from bh_work+0x1b0/0x650 [wfx]
bh_work [wfx] from process_one_work+0x1b8/0x3ec
process_one_work from worker_thread+0x4c/0x57c
worker_thread from kthread+0xe4/0x114
kthread from ret_from_fork+0x14/0x2c
Exception stack(0xf161dfb0 to 0xf161dff8)
dfa0:                                     00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace 0000000000000000 ]---
[  5] 536.16-537.00 sec  26.9 KBytes   261 Kbits/sec                   
[  5] 537.00-538.00 sec  2.71 MBytes  22.7 Mbits/sec                   
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 0000011c
[0000011c] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT ARM
Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables
cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet cdc_acm usb_serial_simple usbserial
usb_f_rndis u_ether wfx mac80211 libarc4 evbug phy_generic cfg80211 adt7475
hwmon_vid ci_hdrc_imx ci_hdrc ulpi roles usbmisc_imx pwm_imx27 pwm_beeper
libcomposite configfs udc_core
CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G        W         5.19.2+ge4fb6643395f #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
PC is at ip6_rcv_core+0x110/0x68c
LR is at ip6_rcv_core+0xb0/0x68c
pc : [<8084d278>]    lr : [<8084d218>]    psr: 20000013
sp : f0859e18  ip : 00000000  fp : 80e13cc0
r10: 00000000  r9 : 80e13cf4  r8 : 81b65000
r7 : 80e6d7c8  r6 : 82024c00  r5 : 812a8760  r4 : 81be5b40
r3 : 00000000  r2 : 00000100  r1 : 000000d7  r0 : 00000000
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c53c7d  Table: 12338059  DAC: 00000051
Register r0 information: NULL pointer
Register r1 information: non-paged memory
Register r2 information: non-paged memory
Register r3 information: NULL pointer
Register r4 information: slab skbuff_head_cache start 81be5b40 pointer offset 0 size 48
Register r5 information: non-slab/vmalloc memory
Register r6 information: slab kmalloc-1k start 82024c00 pointer offset 0 size 1024
Register r7 information: non-slab/vmalloc memory
Register r8 information: slab kmalloc-2k start 81b65000 pointer offset 0 size 2048
Register r9 information: non-slab/vmalloc memory
Register r10 information: NULL pointer
Register r11 information: non-slab/vmalloc memory
Register r12 information: NULL pointer
Process ksoftirqd/0 (pid: 10, stack limit = 0x7cac7060)
Stack: (0xf0859e18 to 0xf085a000)
9e00:                                                       81b65000 80e13d00
9e20: 80e6d7c8 80e13cc8 00000040 80e13cf4 00000000 8084da90 80d0ce80 80d0424c
9e40: 80d0ce80 81b65000 80e13d00 00000001 80e13cc8 80d0424c 8084da60 80e13d00
9e60: 00000001 807691c0 00000001 81be5b40 80d06654 80d0424c 81be5b40 80769348
9e80: 00000001 80e13d00 00000040 f0859ecb 80dd6000 00008b6a f0859ed4 80769ec4
9ea0: 00000001 81080780 00000000 80e13d00 0000012c 00000000 f0859ecc 8076a2d8
9ec0: 00008b6c 81080780 00859f18 f0859ecc f0859ecc f0859ed4 f0859ed4 80d0424c
9ee0: 00000051 00000000 00000003 80e15834 80e15828 81080780 00000100 80adb4e4
9f00: 40000003 801013f4 821d9540 00000000 f0859f5c 80e15828 80d0d390 80e13c80
9f20: 80af6e3c 0000000a 80d0b588 80b19518 00008b6b 80dd6000 04208040 80901dd0
9f40: 81080780 00000000 8102de00 81080780 80d0b558 00000001 00000001 00000000
9f60: 00000000 80120a18 00000000 8013e590 8102de40 8102df00 8013e42c 8102de00
9f80: 81080780 f0835e30 00000000 8013a85c 8102de40 8013a778 00000000 00000000
9fa0: 00000000 00000000 00000000 80100148 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
ip6_rcv_core from ipv6_rcv+0x30/0xd4
ipv6_rcv from __netif_receive_skb_one_core+0x5c/0x80
__netif_receive_skb_one_core from process_backlog+0x70/0xe4
process_backlog from __napi_poll+0x2c/0x1f0
__napi_poll from net_rx_action+0x140/0x264
net_rx_action from __do_softirq+0x12c/0x290
__do_softirq from run_ksoftirqd+0x34/0x3c
run_ksoftirqd from smpboot_thread_fn+0x164/0x224
smpboot_thread_fn from kthread+0xe4/0x114
kthread from ret_from_fork+0x14/0x2c
Exception stack(0xf0859fb0 to 0xf0859ff8)
9fa0:                                     00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Code: e5843024 e5843028 e584302c 0a000055 (e1d231bc)  
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception in interrupt

Now, the questions:
- Is "silabs,brd8023a" the proper compatible string for WFM200S022XNN3, or should we create our
  own for the bare module, even if just the in-band SDIO IRQ, and an external antenna is in use?
- In order to try out the out-of-band IRQ - in hope that it resolves the issue somehow - do we need to create custom PDS file?
  With the IRQ enabled, probe fails with "Chip did not answer" error.
- Tracing memory corruptions is hard - is there a mechanism that could help us out better than generic methods like kprobes,
  or implementing canaries? As skb's are heavily re-used for performance reasons, tracing their lifecycle is especially hard.
  Our first idea was to lock their respective pages from writing, once they are enqueued in the wfx TX queue,
  so MMU detects the corruption at the exact time it happens, but we haven't figure out how to modify skb allocator to accomplish that,
  especially given that the issue mostly happens when transmitting, so skbs are allocated outside of the driver.
  Maybe there exists a similar mechanism - that could help us out - even if just in the works?

Any help will be greatly appreciated - we'll be very happy to provide a patch if we manage to figure the issue out.


W dniu 12.09.2022 o 18:15, Jérôme Pouiller pisze:
> On Monday 12 September 2022 17:16:24 CEST Lech Perczak wrote:
>> Hello,
>>
>> We're trying to get a WFM200S022XNN3 module working on a custom i.MX6Q board using SDIO interface, using upstream kernel. Our patches concern primarily the device tree for the board - and upstream firmware from linux-firmware repository.
>>
>> During that, we stumbled upon a memory corruption issue, which appears when big traffic is passing through the device. Our adapter is running in AP mode. This can be reproduced with 100% rate using iperf3, by starting an AP interface on the device, and an iperf3 server. Then, the client station runs iperf3 with "iperf3 -c <hostname> -t 3600" command - so the AP is sending data for up to one hour, however - the kernel on our device crashes after around a few minutes of traffic, sometimes less than a minute.
>>
>> The behaviour is the same on kernel v5.19.7, v5.19.2, and even with v6.0-rc5. Tests on v6.0-rc5 have shown most detailed stacktrace so far:
>>
> Hello Lech,
>
> It seems that something somewhere (Ms Exchange, I am looking at you) has
> removed all the newlines of your mail :-/. Can you try to fix the problem?
> I think that sending mails using base64 encoding would solve the issue.
>
>
> [...]
>
> --
> Jérôme Pouiller

-- 
Pozdrawiam/With kind regards,
Lech Perczak

Sr. Software Engineer
Camlin Technologies Poland Limited Sp. z o.o.
Strzegomska 54,
53-611 Wroclaw
Tel:     (+48) 71 75 000 16
Email:   lech.perczak@xxxxxxxxxxxxxxx
Website: http://www.camlingroup.com





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux