Hi Michael, On 24/10/22 19:24, Michael Walle wrote: > Hi, > > I'm using the WILC1000 wifi chip in SDIO mode and with NetworkManager > which seems to be probing the network in the background. What I am > seeing is a kernel oops while processing the workqueue. > > This is on a kernel 5.15.74, but it also happens with the latest next, > but not that often - I guess due to a different timing. > > My reduced steps to reproduce are the following: > $ while true; do ifconfig wlan0 up; iw dev wlan0 scan & \ > ifconfig wlan0 down; done It looks like a timing issue. During the execution of above steps, I observed that most of the time, the interface down command is executed before the scan command. A few times, the scan command is executed between interface up and down. > > After a while I'll get the following splash: > > [ 487.955326] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > [ 487.955363] Mem abort info: > [ 487.955366] ESR = 0x96000004 > [ 487.955370] EC = 0x25: DABT (current EL), IL = 32 bits > [ 487.965939] FW not responding > [ 487.971033] SET = 0, FnV = 0 > [ 487.971039] EA = 0, S1PTW = 0 > [ 487.971043] FSC = 0x04: level 0 translation fault > [ 487.971047] Data abort info: > [ 487.971050] ISV = 0, ISS = 0x00000004 > [ 487.971053] CM = 0, WnR = 0 > [ 487.971059] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000497b0000 > [ 487.971066] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > [ 487.971085] Internal error: Oops: 96000004 [#1] SMP > [ 487.971094] Modules linked in: > [ 487.971104] CPU: 1 PID: 9 Comm: kworker/u8:0 Not tainted 5.15.74-00013-g2d5897cb12ef #130 > [ 487.971113] Hardware name: NXP i.MX8MNano DDR3L EVK board (DT) > [ 487.971122] Workqueue: WILC_wq handle_rcvd_ntwrk_info > [ 488.035377] wilc1000_sdio mmc1:0001:1: chipid (001003a0) > [ 487.971085] Internal error: Oops: > [ 488.041180] 96000004 [#1] SMP > [ 488.041186] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 488.041196] pc : handle_rcvd_ntwrk_info+0x7c/0xc4 > [ 488.041208] lr : handle_rcvd_ntwrk_info+0x70/0xc4 > [ 488.049128] wilc1000_sdio mmc1:0001:1: has_thrpt_enh3 = 1... > [ 488.057273] sp : ffff80000a20bd70 > [ 488.057277] x29: ffff80000a20bd70 x28: 0000000000000000 x27: 0000000000000000 > [ 488.057289] x26: ffff000000118470 x25: ffff000005059d05 x24: ffff00000de94d30 > [ 488.057299] x23: 0000000000000000 > [ 488.062670] wilc1000_sdio mmc1:0001:1 wlan0: ChipID [1003a0] loading firmware [atmel/wilc1000_wifi_firmware-1.bin] > [ 488.070418] x22: ffff000005059d00 x21: 0000000000000000 > [ 488.070428] x20: ffff00000de94d00 x19: ffff00000de94d28 x18: 0000000000000000 > [ 488.070440] x17: 0000000000000000 x16: 0000000000000000 x15: a4270000a4030001 > [ 488.070450] x14: 010102f2500018dd x13: 0018dd0000010002 x12: 0546000000000000 > [ 488.070461] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff800008ad92a0 > [ 488.150644] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000063e88c0 > [ 488.157799] x5 : 0000000000000000 x4 : 0000000000000003 x3 : 0000000000000000 > [ 488.164947] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000001 > [ 488.172095] Call trace: > [ 488.174548] handle_rcvd_ntwrk_info+0x7c/0xc4 > [ 488.175400] FW not responding > [ 488.178927] process_one_work+0x1ec/0x48c > [ 488.178941] worker_thread+0x170/0x564 > [ 488.178948] kthread+0x128/0x13c > [ 488.178959] ret_from_fork+0x10/0x20 > [ 488.185280] FW not responding > [ 488.185958] Code: 9415ea8e b4000060 39400401 35000201 (f94002a3) > [ 488.192042] FW not responding > [ 488.192957] ---[ end trace fa915dc840cf0355 ]--- > [ 488.199700] FW not responding > [ 488.205601] Kernel panic - not syncing: Oops: Fatal exception > [ 488.205608] SMP: stopping secondary CPUs > [ 488.205630] Kernel Offset: disabled > [ 488.205634] CPU features: 0x00002001,20000846 > [ 488.205642] Memory Limit: none > > In handle_rcvd_ntwrk_info() scan_req->scan_result isn't valid anymore, > although it doesn't contain NULL. Thus the driver is calling into a > bogus function pointer. There seems to be no locking between the > asynchronous calls within the workqueue (wilc_enqueue_work()) and when > the interface is disabled (wilc_deinit()). wilc_deinit() will free the > host_if_drv object which might still be used within the workqueue > context. Please try the below code changes with your test setup environment. --- a/drivers/net/wireless/microchip/wilc1000/hif.c +++ b/drivers/net/wireless/microchip/wilc1000/hif.c @@ -495,12 +495,18 @@ static void handle_rcvd_ntwrk_info(struct work_struct *work) { struct host_if_msg *msg = container_of(work, struct host_if_msg, work); struct wilc_rcvd_net_info *rcvd_info = &msg->body.net_info; - struct wilc_user_scan_req *scan_req = &msg->vif->hif_drv->usr_scan_req; + struct host_if_drv *hif_drv = msg->vif->hif_drv; + struct wilc_user_scan_req *scan_req; const u8 *ch_elm; u8 *ies; int ies_len; size_t offset; + if (!hif_drv || !hif_drv->usr_scan_req.scan_result) + goto done; + + scan_req = &hif_drv->usr_scan_req; + if (ieee80211_is_probe_resp(rcvd_info->mgmt->frame_control)) offset = offsetof(struct ieee80211_mgmt, u.probe_resp.variable); else if (ieee80211_is_beacon(rcvd_info->mgmt->frame_control)) @@ -1574,6 +1580,9 @@ void wilc_network_info_received(struct wilc *wilc, u8 *buffer, u32 length) return; } + if (!hif_drv->usr_scan_req.scan_result) + return; + msg = wilc_alloc_work(vif, handle_rcvd_ntwrk_info, false); if (IS_ERR(msg)) return; The above changes should avoid the kernel crash exception. > BTW, ignore the "FW not repsonding" for now, that seems to be a > different problem. "FW not responding" log indicates the chip sleep command failure from Host to the FW. It's a temporary failure log for specific command. During the de-init process, this logs is often observed. IIRC, there was a change in the latest driver that reduced its frequency but I am unable to recall the exact change. Regards, Ajay