> -----Original Message----- > From: Jonas Gorski <jonas.gorski@xxxxxxxxx> > Sent: Thursday, March 23, 2023 4:52 AM > To: Larry Finger <Larry.Finger@xxxxxxxxxxxx> > Cc: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-wireless@xxxxxxxxxxxxxxx; Ping-Ke > Shih <pkshih@xxxxxxxxxxx> > Subject: Re: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup > > On Wed, 22 Mar 2023 at 18:03, Larry Finger <Larry.Finger@xxxxxxxxxxxx> wrote: > > > > On 3/22/23 10:54, Hyeonggon Yoo wrote: > > > > > > Hello folks, > > > I've just encountered weird bug when booting Linux v6.2.7 > > > > > > config: attached > > > dmesg: attached > > > > > > I'm not sure exactly how to trigger this issue yet because it's not > > > stably reproducible. (just have encountered randomly when logging in) > > > > > > At quick look it seems to be related to rtw89 wireless driver or network subsystem. > > > > Your bug is weird indeed, and it does come from rtw89_8852be. My distro has not > > yet released kernel 6.2.7, but I have not seen this problem with mainline > > kernels throughout the 6.2 or 6.3 development series. > > Looking at the rtw89 driver's probe function, the bug is probably a > simple race condition: > > int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > { > ... > ret = rtw89_core_register(rtwdev); <- calls ieee80211_register_hw(); > ... > rtw89_core_napi_init(rtwdev); > ... > } > > so it registers the wifi device first, making it visible to userspace, > and then initializes napi. > > So there is a window where a fast userspace may already try to > interact with the device before the driver got around to initializing > the napi parts, and then it explodes. At least that is my theory for > the issue. > > Switching the order of these two functions should avoid it in theory, > as long as rtw89_core_napi_init() doesn't depend on anything > rtw89_core_register() does. > > FWIW, registering the irq handler only after registering the device > also seems suspect, and should probably also happen before that. Adding a 10 seconds sleep between rtw89_core_register() and rtw89_core_napi_init(), and I can reproduce this issue: int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { ... ret = rtw89_core_register(rtwdev); ... msleep(10 * 100); ... rtw89_core_napi_init(rtwdev); ... } And, as your suggestion, I move the rtw89_core_register() to the last step of PCI probe. Then, it looks more reasonable that we prepare NAPI and interrupt handlers before registering netdev. Could you give it a try with below fix? diff --git a/pci.c b/pci.c index fe6c0efc..087de2e0 100644 --- a/pci.c +++ b/pci.c @@ -3879,25 +3879,26 @@ int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) rtw89_pci_link_cfg(rtwdev); rtw89_pci_l1ss_cfg(rtwdev); - ret = rtw89_core_register(rtwdev); - if (ret) { - rtw89_err(rtwdev, "failed to register core\n"); - goto err_clear_resource; - } - rtw89_core_napi_init(rtwdev); ret = rtw89_pci_request_irq(rtwdev, pdev); if (ret) { rtw89_err(rtwdev, "failed to request pci irq\n"); - goto err_unregister; + goto err_deinit_napi; + } + + ret = rtw89_core_register(rtwdev); + if (ret) { + rtw89_err(rtwdev, "failed to register core\n"); + goto err_free_irq; } return 0; -err_unregister: +err_free_irq: + rtw89_pci_free_irq(rtwdev, pdev); +err_deinit_napi: rtw89_core_napi_deinit(rtwdev); - rtw89_core_unregister(rtwdev); err_clear_resource: rtw89_pci_clear_resource(rtwdev, pdev); err_declaim_pci: -- Ping-Ke