On Mon, Jun 28, 2021 at 05:36:22PM +0800, linyyuan@xxxxxxxxxxxxxx wrote: > On 2021-06-27 22:09, Alan Stern wrote: > > > > CPU0 CPU1 > > ---- ---- > > usb_gadget_remove_driver runs > > Calls synchronize_irq > > synchronize_irq returns > > Calls udc_driver_unbind > > IRQ happens for disconnect > > Handler unlocks dwc->lock > > Calls dwc->gadget_driver->disconnect > > Gadget driver has already been unbound > > and is not prepared to handle a > > callback, so it crashes > > Calls usb_gadget_udc_stop > > dwc->gadget_driver is > > set to NULL > > > > Without the async_callbacks mechanism, the gadget driver can get a > > callback at the wrong time (after it has been unbound), which might > > cause it to crash. > 1. do you think we need to back to my original patch, > https://lore.kernel.org/linux-usb/20210619154309.52127-1-linyyuan@xxxxxxxxxxxxxx/T/#t No, I think the async_callbacks approach is slightly better. > i think we can add the spin lock or mutex lock to protect this kind of race > from UDC layer, it will be easy understanding. We don't actually need a lock or mutex to fix this problem. We just need to make the remove_driver sequence issue two calls to the UDC driver rather than just one: async_callbacks and udc_stop. > 2. if you insist this kind of change, how to change following code in dwc3 ? > if (dwc->gadget_driver && dwc->gadget_driver->disconnect) { > > 2.1 if (dwc->async_callbacks && dwc->gadget_driver->disconnect) { > or > 2.2 if (dwc->async_callbacks && vdwc->gadget_driver && > dwc->gadget_driver->disconnect) { Either one would be okay. If async_callbacks is enabled then dwc->gadget_driver should never be NULL, but it won't hurt to check. After all, disconnect does not occur often and it doesn't need to run extremely quickly. Alan Stern