Re: wilc1000 kernel crash

Michael Walle <mwalle@xxxxxxxxxx> · Wed, 26 Oct 2022 10:54:15 +0200

Hi Ajay,

On 22/10/25 08:26, Ajay.Kathat@xxxxxxxxxxxxx wrote:
> > In handle_rcvd_ntwrk_info() scan_req->scan_result isn't valid anymore,
> > although it doesn't contain NULL. Thus the driver is calling into a
> > bogus function pointer. There seems to be no locking between the
> > asynchronous calls within the workqueue (wilc_enqueue_work()) and when
> > the interface is disabled (wilc_deinit()). wilc_deinit() will free the
> > host_if_drv object which might still be used within the workqueue
> > context.
> 
> 
> Please try the below code changes with your test setup environment.

The workqueue handling and wilc_deinit() run in parallel, correct? ..

> --- a/drivers/net/wireless/microchip/wilc1000/hif.c
> +++ b/drivers/net/wireless/microchip/wilc1000/hif.c
> @@ -495,12 +495,18 @@ static void handle_rcvd_ntwrk_info(struct 
> work_struct *work)
>   {
>          struct host_if_msg *msg = container_of(work, struct 
> host_if_msg, work);
>          struct wilc_rcvd_net_info *rcvd_info = &msg->body.net_info;
> -       struct wilc_user_scan_req *scan_req = 
> &msg->vif->hif_drv->usr_scan_req;
> +       struct host_if_drv *hif_drv = msg->vif->hif_drv;
> +       struct wilc_user_scan_req *scan_req;
>          const u8 *ch_elm;
>          u8 *ies;
>          int ies_len;
>          size_t offset;
> 
> +       if (!hif_drv || !hif_drv->usr_scan_req.scan_result)
> +               goto done;

.. So what if hif_drv will be set to NULL right after this check?

I don't think you'll get around using proper locking here.

> +
> +       scan_req = &hif_drv->usr_scan_req;
> +
>          if (ieee80211_is_probe_resp(rcvd_info->mgmt->frame_control))
>                  offset = offsetof(struct ieee80211_mgmt, 
> u.probe_resp.variable);
>          else if (ieee80211_is_beacon(rcvd_info->mgmt->frame_control))
> @@ -1574,6 +1580,9 @@ void wilc_network_info_received(struct wilc *wilc, 
> u8 *buffer, u32 length)
>                  return;
>          }
> 
> +       if (!hif_drv->usr_scan_req.scan_result)
> +               return;
> +

This is also racy. What if scan_result is cleared right after this
check? Then the work item will still get added to the work queue.

Here is the call tree:

isr_bh_routine() [interrupt thread ctx]
  wilc_handle_isr()
    mutex_lock(hif_cs)
    wilc_wlan_handle_isr_ext()
	  wilc_wlan_handle_rxq()
	    wilc_wlan_handle_rx_buff()
          wilc_wlan_cfg_indicate_rx()
            wilc_network_info_received()
			  wilc_enqueue_work(handle_rcvd_ntwrk_info)
    mutex_unlock(hif_cs)

handle_rcvd_ntwrk_info [hif_workqueue ctx]
  if (scan_result)
    scan_result()

wilc_mac_close() [ioctl ctx?]
  wilc_deinit_host_int()
    wilc_deinit()
	mutex_lock(deinit_lock)
	if (scan_result)
	  scan_result()
	  scan_result = NULL
	kfree(hif_drv)
	hif_drv = NULL
	mutex_unlock(deinit_lock)

I don't see any synchronization mechanisms, between these threads.

>          msg = wilc_alloc_work(vif, handle_rcvd_ntwrk_info, false);
>          if (IS_ERR(msg))
>                  return;
> 
> The above changes should avoid the kernel crash exception.

As mentioned above, I think this will just decrease the chance that
it is happening. Nonetheless, I've tried your changes but it doesn't
fix the crash.

> > BTW, ignore the "FW not repsonding" for now, that seems to be a
> > different problem.
> 
> "FW not responding" log indicates the chip sleep command failure from 
> Host to the FW. It's a temporary failure log for specific command. 
> During the de-init process, this logs is often observed. IIRC, there was 
> a change in the latest driver that reduced its frequency but I am unable 
> to recall the exact change.

So what is a "temporary failure"? It is a pr_warn() and customers
(rightfully) complains about them. Apart from that, it only happens on
the second "ifconfig wlan0 up". There are no messages during the first
one. So I suspect there is still something fishy.

-michael