On Mon, 13 Jan 2020, Andrey Konovalov wrote: > On Mon, Jan 13, 2020 at 6:34 PM Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote: > > > > On Mon, Jan 13, 2020 at 5:50 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Mon, 13 Jan 2020, Andrey Konovalov wrote: > > > > > > > I've also found an issue, but I'm not sure if that is the bug in Raw > > > > Gadget, or in the gadget layer (in the former case I'll add this fix > > > > to v5 as well). What I believe I'm seeing is > > > > __fput()->usb_gadget_unregister_driver()->usb_gadget_remove_driver()->gadget_unbind() > > > > racing with dummy_timer()->gadget_setup(). In my case it results in > > > > gadget_unbind() doing set_gadget_data(gadget, NULL), and then > > > > gadget_setup() dereferencing get_gadget_data(gadget). > > > > > > > > Alan, does it look possible for those two functions to race? Should > > > > this be prevented by the gadget layer, or should I use some kind of > > > > locking in my gadget driver to prevent this? > > > > > > In your situation this race shouldn't happen, because before > > > udc->driver->unbind() is invoked we call usb_gadget_disconnect(). If > > > that routine succeeds -- which it always does under dummy-hcd -- then > > > there can't be any more setup callbacks, because find_endpoint() will > > > always return NULL (the is_active() test fails; see the various > > > set_link_state* routines). So I don't see how you could have ended up > > > with the race you describe. > > > > I've managed to reproduce the race by adding an mdelay() into the > > beginning of the setup() callback. AFAIU what happens is setup() gets > > called (and waits on the mdelay()), then unbind() comes in and does > > set_gadget_data(NULL), and then setup() proceeds, gets NULL through > > get_gadget_data() and crashes on null-ptr-deref. I've got the same > > crash a few times after many days of fuzzing, so I assume it can > > happen without the mdelay() as well. > > > > > However, a real UDC might not be able to perform a disconnect under > > > software control. In that case usb_gadget_disconnect() would not > > > change the pullup state, and there would be a real possibility of a > > > setup callback racing with an unbind callback. This seems like a > > > genuine problem and I can't think of a solution offhand. > > > > > > What we would need is a way to tell the UDC driver to stop invoking > > > gadget callbacks, _before_ the UDC driver's stop callback gets called. > > > Maybe this should be merged into the pullup callback somehow. > > Perhaps for the dummy driver we need to wait for setup() to finish if > it's being executed and then stop the dummy timer in dummy_pullup()? Yes, we need to wait for a setup callback to finish. But dummy_timer should not be stopped; otherwise URBs that have already been submitted would never be given back. The big problem is that usb_gadget_disconnect() can be called in interrupt context. In general, a UDC driver will need to call synchronize_irq() to make sure there aren't any callbacks still running, and that can be done only in process context. dummy-hcd is slightly different since it doesn't manage actual hardware; it calls usleep_range() instead of synchronize_irq() -- but that also requires process context. We could change the gadget API to require that usb_gadget_disconnect() and related routines always be called in process context. I don't know if that's such a good idea though. A gadget driver might want to disconnect from within its setup handler or a completion routine, for example. A better approach would be to add a new synchronize_callbacks() pointer in the usb_gadget_ops structure. But to work properly it would have to be mandatory for every UDC driver, and that's not so easy to accomplish. Suggestions? Alan Stern