On Fri, 17 Mar 2017, Guenter Roeck wrote: > While running a bind/unbind stress test with the dwc3 usb driver on rk3399, > the following crash was observed. > > Unable to handle kernel NULL pointer dereference at virtual address 00000218 > pgd = ffffffc00165f000 > [00000218] *pgd=000000000174f003, *pud=000000000174f003, > *pmd=0000000001750003, *pte=00e8000001751713 > Internal error: Oops: 96000005 [#1] PREEMPT SMP > Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm > xt_mark fuse bridge stp llc zram btusb btrtl btbcm btintel bluetooth > ip6table_filter mwifiex_pcie mwifiex cfg80211 cdc_ether usbnet r8152 mii joydev > snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async > ppp_generic slhc tun > CPU: 1 PID: 29814 Comm: kworker/1:1 Not tainted 4.4.52 #507 > Hardware name: Google Kevin (DT) > Workqueue: pm pm_runtime_work > task: ffffffc0ac540000 ti: ffffffc0af4d4000 task.ti: ffffffc0af4d4000 > PC is at autosuspend_check+0x74/0x174 > LR is at autosuspend_check+0x70/0x174 > ... > Call trace: > [<ffffffc00080dcc0>] autosuspend_check+0x74/0x174 > [<ffffffc000810500>] usb_runtime_idle+0x20/0x40 > [<ffffffc000785ae0>] __rpm_callback+0x48/0x7c > [<ffffffc000786af0>] rpm_idle+0x1e8/0x498 > [<ffffffc000787cdc>] pm_runtime_work+0x88/0xcc > [<ffffffc000249bb8>] process_one_work+0x390/0x6b8 > [<ffffffc00024abcc>] worker_thread+0x480/0x610 > [<ffffffc000251a80>] kthread+0x164/0x178 > [<ffffffc0002045d0>] ret_from_fork+0x10/0x40 > > Source: > > (gdb) l *0xffffffc00080dcc0 > 0xffffffc00080dcc0 is in autosuspend_check > (drivers/usb/core/driver.c:1778). > 1773 /* We don't need to check interfaces that are > 1774 * disabled for runtime PM. Either they are unbound > 1775 * or else their drivers don't support autosuspend > 1776 * and so they are permanently active. > 1777 */ > 1778 if (intf->dev.power.disable_depth) > 1779 continue; > 1780 if (atomic_read(&intf->dev.power.usage_count) > 0) > 1781 return -EBUSY; > 1782 w |= intf->needs_remote_wakeup; > > Code analysis shows that intf is set to NULL in usb_disable_device() prior > to setting actconfig to NULL. At the same time, usb_runtime_idle() does not > lock the usb device, and neither does any of the functions in the > traceback. This means that there is no protection against a race condition > where usb_disable_device() is removing dev->actconfig->interface[] pointers > while those are being accessed from autosuspend_check() and possibly by > other callers. That is an interesting race, one that had previously escaped my notice. > Explicitly disable autosuspend in usb_disconnect() before calling > usb_disable_device(). This doesn't fix the race for good, but it ensures > that the pm runtime worker doesn't call usb_runtime_idle() on the interface > that is being removed, and thus avoids the race in the affected code path. > > Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx> > --- > This is another interesting situation. As mentioned above, the patch doesn't > really fix the race problem. On the other side, fixing it for good would > (probably) be much more complex. I still see the race after applying this > patch, but it happens maybe once a day vs. several times per hour. > > Marked as RFC in the hope that someone has an idea for a better fix. > I tried clearing udev->actconfig prior to removing the interfaces > in usb_disable_device(), but that alone didn't help; it does not > resolve the race condition either, and still results in the crash. > > The only clean solution I can think of would be to protect accesses > to dev->actconfig with a spinlock or mutex, and to make sure that the > lock is held during read accesses and that dev->actconfig is cleared > before releasing the lock on write accesses. I'll be happy to do that > if it is the way to go, but I would like some feedback before I give it > a try. I think the right thing to do is test udev->state at the start of autosuspend_check(), much like the test at the start of usb_suspend_both(). udev->state gets set to USB_STATE_NOTATTACHED in usb_disconnect() before usb_disable_device() is called. Then there will be no need for usb_disable_autosuspend(), although pm_runtime_barrier() will still be necessary. Alan Stern > drivers/usb/core/hub.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > index 5286bf67869a..5a420657f9f7 100644 > --- a/drivers/usb/core/hub.c > +++ b/drivers/usb/core/hub.c > @@ -2093,6 +2093,15 @@ void usb_disconnect(struct usb_device **pdev) > * so that the hardware is now fully quiesced. > */ > dev_dbg(&udev->dev, "unregistering device\n"); > + > + /* > + * Disable autosuspend before disabling the device, and make sure > + * that autosuspend doesn't touch it while it is in the process > + * of being deleted. > + */ > + usb_disable_autosuspend(udev); > + pm_runtime_barrier(&udev->dev); > + > usb_disable_device(udev, 0); > usb_hcd_synchronize_unlinks(udev); > > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html