Search Linux Wireless

Re: wilc1000 kernel crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Michael,

On 24/10/22 19:24, Michael Walle wrote:
> Hi,
>
> I'm using the WILC1000 wifi chip in SDIO mode and with NetworkManager
> which seems to be probing the network in the background. What I am
> seeing is a kernel oops while processing the workqueue.
>
> This is on a kernel 5.15.74, but it also happens with the latest next,
> but not that often - I guess due to a different timing.
>
> My reduced steps to reproduce are the following:
>    $ while true; do ifconfig wlan0 up; iw dev wlan0 scan & \
>        ifconfig wlan0 down; done

It looks like a timing issue. During the execution of above steps, I 
observed that most of the time, the interface down command is executed 
before the scan command. A few times, the scan command is executed 
between interface up and down.

>
> After a while I'll get the following splash:
>
> [  487.955326] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [  487.955363] Mem abort info:
> [  487.955366]   ESR = 0x96000004
> [  487.955370]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  487.965939] FW not responding
> [  487.971033]   SET = 0, FnV = 0
> [  487.971039]   EA = 0, S1PTW = 0
> [  487.971043]   FSC = 0x04: level 0 translation fault
> [  487.971047] Data abort info:
> [  487.971050]   ISV = 0, ISS = 0x00000004
> [  487.971053]   CM = 0, WnR = 0
> [  487.971059] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000497b0000
> [  487.971066] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> [  487.971085] Internal error: Oops: 96000004 [#1] SMP
> [  487.971094] Modules linked in:
> [  487.971104] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.74-00013-g2d5897cb12ef #130
> [  487.971113] Hardware name: NXP i.MX8MNano DDR3L EVK board (DT)
> [  487.971122] Workqueue: WILC_wq handle_rcvd_ntwrk_info
> [  488.035377] wilc1000_sdio mmc1:0001:1: chipid (001003a0)
> [  487.971085] Internal error: Oops:
> [  488.041180] 96000004 [#1] SMP
> [  488.041186] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  488.041196] pc : handle_rcvd_ntwrk_info+0x7c/0xc4
> [  488.041208] lr : handle_rcvd_ntwrk_info+0x70/0xc4
> [  488.049128] wilc1000_sdio mmc1:0001:1: has_thrpt_enh3 = 1...
> [  488.057273] sp : ffff80000a20bd70
> [  488.057277] x29: ffff80000a20bd70 x28: 0000000000000000 x27: 0000000000000000
> [  488.057289] x26: ffff000000118470 x25: ffff000005059d05 x24: ffff00000de94d30
> [  488.057299] x23: 0000000000000000
> [  488.062670] wilc1000_sdio mmc1:0001:1 wlan0: ChipID [1003a0] loading firmware [atmel/wilc1000_wifi_firmware-1.bin]
> [  488.070418]  x22: ffff000005059d00 x21: 0000000000000000
> [  488.070428] x20: ffff00000de94d00 x19: ffff00000de94d28 x18: 0000000000000000
> [  488.070440] x17: 0000000000000000 x16: 0000000000000000 x15: a4270000a4030001
> [  488.070450] x14: 010102f2500018dd x13: 0018dd0000010002 x12: 0546000000000000
> [  488.070461] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800008ad92a0
> [  488.150644] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000063e88c0
> [  488.157799] x5 : 0000000000000000 x4 : 0000000000000003 x3 : 0000000000000000
> [  488.164947] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000001
> [  488.172095] Call trace:
> [  488.174548]  handle_rcvd_ntwrk_info+0x7c/0xc4
> [  488.175400] FW not responding
> [  488.178927]  process_one_work+0x1ec/0x48c
> [  488.178941]  worker_thread+0x170/0x564
> [  488.178948]  kthread+0x128/0x13c
> [  488.178959]  ret_from_fork+0x10/0x20
> [  488.185280] FW not responding
> [  488.185958] Code: 9415ea8e b4000060 39400401 35000201 (f94002a3)
> [  488.192042] FW not responding
> [  488.192957] ---[ end trace fa915dc840cf0355 ]---
> [  488.199700] FW not responding
> [  488.205601] Kernel panic - not syncing: Oops: Fatal exception
> [  488.205608] SMP: stopping secondary CPUs
> [  488.205630] Kernel Offset: disabled
> [  488.205634] CPU features: 0x00002001,20000846
> [  488.205642] Memory Limit: none
>
> In handle_rcvd_ntwrk_info() scan_req->scan_result isn't valid anymore,
> although it doesn't contain NULL. Thus the driver is calling into a
> bogus function pointer. There seems to be no locking between the
> asynchronous calls within the workqueue (wilc_enqueue_work()) and when
> the interface is disabled (wilc_deinit()). wilc_deinit() will free the
> host_if_drv object which might still be used within the workqueue
> context.


Please try the below code changes with your test setup environment.


--- a/drivers/net/wireless/microchip/wilc1000/hif.c
+++ b/drivers/net/wireless/microchip/wilc1000/hif.c
@@ -495,12 +495,18 @@ static void handle_rcvd_ntwrk_info(struct 
work_struct *work)
  {
         struct host_if_msg *msg = container_of(work, struct 
host_if_msg, work);
         struct wilc_rcvd_net_info *rcvd_info = &msg->body.net_info;
-       struct wilc_user_scan_req *scan_req = 
&msg->vif->hif_drv->usr_scan_req;
+       struct host_if_drv *hif_drv = msg->vif->hif_drv;
+       struct wilc_user_scan_req *scan_req;
         const u8 *ch_elm;
         u8 *ies;
         int ies_len;
         size_t offset;

+       if (!hif_drv || !hif_drv->usr_scan_req.scan_result)
+               goto done;
+
+       scan_req = &hif_drv->usr_scan_req;
+
         if (ieee80211_is_probe_resp(rcvd_info->mgmt->frame_control))
                 offset = offsetof(struct ieee80211_mgmt, 
u.probe_resp.variable);
         else if (ieee80211_is_beacon(rcvd_info->mgmt->frame_control))
@@ -1574,6 +1580,9 @@ void wilc_network_info_received(struct wilc *wilc, 
u8 *buffer, u32 length)
                 return;
         }

+       if (!hif_drv->usr_scan_req.scan_result)
+               return;
+
         msg = wilc_alloc_work(vif, handle_rcvd_ntwrk_info, false);
         if (IS_ERR(msg))
                 return;

The above changes should avoid the kernel crash exception.


> BTW, ignore the "FW not repsonding" for now, that seems to be a
> different problem.

"FW not responding" log indicates the chip sleep command failure from 
Host to the FW. It's a temporary failure log for specific command. 
During the de-init process, this logs is often observed. IIRC, there was 
a change in the latest driver that reduced its frequency but I am unable 
to recall the exact change.

Regards,
Ajay





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux