Re: wilc1000 kernel crash

Michael Walle <mwalle@xxxxxxxxxx> · Fri, 9 Dec 2022 13:03:43 +0100

On 22/10/26 10:54, Michael Walle wrote:
> Hi Ajay,
> 
> On 22/10/25 08:26, Ajay.Kathat@xxxxxxxxxxxxx wrote:
> > > In handle_rcvd_ntwrk_info() scan_req->scan_result isn't valid anymore,
> > > although it doesn't contain NULL. Thus the driver is calling into a
> > > bogus function pointer. There seems to be no locking between the
> > > asynchronous calls within the workqueue (wilc_enqueue_work()) and when
> > > the interface is disabled (wilc_deinit()). wilc_deinit() will free the
> > > host_if_drv object which might still be used within the workqueue
> > > context.
> > 
> > 
> > Please try the below code changes with your test setup environment.
> 
> The workqueue handling and wilc_deinit() run in parallel, correct? ..
> 
> > --- a/drivers/net/wireless/microchip/wilc1000/hif.c
> > +++ b/drivers/net/wireless/microchip/wilc1000/hif.c
> > @@ -495,12 +495,18 @@ static void handle_rcvd_ntwrk_info(struct 
> > work_struct *work)
> >   {
> >          struct host_if_msg *msg = container_of(work, struct 
> > host_if_msg, work);
> >          struct wilc_rcvd_net_info *rcvd_info = &msg->body.net_info;
> > -       struct wilc_user_scan_req *scan_req = 
> > &msg->vif->hif_drv->usr_scan_req;
> > +       struct host_if_drv *hif_drv = msg->vif->hif_drv;
> > +       struct wilc_user_scan_req *scan_req;
> >          const u8 *ch_elm;
> >          u8 *ies;
> >          int ies_len;
> >          size_t offset;
> > 
> > +       if (!hif_drv || !hif_drv->usr_scan_req.scan_result)
> > +               goto done;
> 
> .. So what if hif_drv will be set to NULL right after this check?
> 
> I don't think you'll get around using proper locking here.
> 
> > +
> > +       scan_req = &hif_drv->usr_scan_req;
> > +
> >          if (ieee80211_is_probe_resp(rcvd_info->mgmt->frame_control))
> >                  offset = offsetof(struct ieee80211_mgmt, 
> > u.probe_resp.variable);
> >          else if (ieee80211_is_beacon(rcvd_info->mgmt->frame_control))
> > @@ -1574,6 +1580,9 @@ void wilc_network_info_received(struct wilc *wilc, 
> > u8 *buffer, u32 length)
> >                  return;
> >          }
> > 
> > +       if (!hif_drv->usr_scan_req.scan_result)
> > +               return;
> > +
> 
> This is also racy. What if scan_result is cleared right after this
> check? Then the work item will still get added to the work queue.
> 
> Here is the call tree:
> 
> isr_bh_routine() [interrupt thread ctx]
>   wilc_handle_isr()
>     mutex_lock(hif_cs)
>     wilc_wlan_handle_isr_ext()
> 	  wilc_wlan_handle_rxq()
> 	    wilc_wlan_handle_rx_buff()
>           wilc_wlan_cfg_indicate_rx()
>             wilc_network_info_received()
> 			  wilc_enqueue_work(handle_rcvd_ntwrk_info)
>     mutex_unlock(hif_cs)
> 
> handle_rcvd_ntwrk_info [hif_workqueue ctx]
>   if (scan_result)
>     scan_result()
> 
> wilc_mac_close() [ioctl ctx?]
>   wilc_deinit_host_int()
>     wilc_deinit()
> 	mutex_lock(deinit_lock)
> 	if (scan_result)
> 	  scan_result()
> 	  scan_result = NULL
> 	kfree(hif_drv)
> 	hif_drv = NULL
> 	mutex_unlock(deinit_lock)
> 
> I don't see any synchronization mechanisms, between these threads.
> 
> >          msg = wilc_alloc_work(vif, handle_rcvd_ntwrk_info, false);
> >          if (IS_ERR(msg))
> >                  return;
> > 
> > The above changes should avoid the kernel crash exception.
> 
> As mentioned above, I think this will just decrease the chance that
> it is happening. Nonetheless, I've tried your changes but it doesn't
> fix the crash.

Any news here?

Thanks,
-michael

> > > BTW, ignore the "FW not repsonding" for now, that seems to be a
> > > different problem.
> > 
> > "FW not responding" log indicates the chip sleep command failure from 
> > Host to the FW. It's a temporary failure log for specific command. 
> > During the de-init process, this logs is often observed. IIRC, there was 
> > a change in the latest driver that reduced its frequency but I am unable 
> > to recall the exact change.
> 
> So what is a "temporary failure"? It is a pr_warn() and customers
> (rightfully) complains about them. Apart from that, it only happens on
> the second "ifconfig wlan0 up". There are no messages during the first
> one. So I suspect there is still something fishy.
> 
> -michael