On Fri, 5 Jul 2019, Benjamin Herrenschmidt wrote: > (following our conversation) > > Here's a completely untested alternative patch (it replaces my previous > one) that fixes it a bit differently. > > This time it should handle the case of a disconnect happening > before we have dequeued a config change. > > This assumes that it's correct to never call > usb_composite_setup_continue() if an fsg_disable() happens after a > fsg_set_alt() and before we have processed the latter. That should be handled okay. If it isn't, the composite core needs to be fixed. > I will try to test it tomorrow if time permits, otherwise some time > next week: > --- > > [PATCH] usb: gadget: mass_storage: Fix races between fsg_disable and fsg_set_alt > > If fsg_disable() and fsg_set_alt() are called too closely to each > other (for example due to a quick reset/reconnect), what can happen > is that fsg_set_alt sets common->new_fsg from an interrupt while > handle_exception is trying to process the config change caused by > fsg_disable(): > > fsg_disable() > ... > handle_exception() > sets state back to FSG_STATE_NORMAL > hasn't yet called do_set_interface() > or is inside it. > > ---> interrupt > fsg_set_alt > sets common->new_fsg > queues a new FSG_STATE_CONFIG_CHANGE > <--- > > Now, the first handle_exception can "see" the updated > new_fsg, treats it as if it was a fsg_set_alt() response, > call usb_composite_setup_continue() etc... > > But then, the thread sees the second FSG_STATE_CONFIG_CHANGE, > and goes back down the same path, wipes and reattaches a now > active fsg, and .. calls usb_composite_setup_continue() which > at this point is wrong. > > Not only we get a backtrace, but I suspect the second set_interface > wrecks some state causing the host to get upset in my case. > > This fixes it by replacing "new_fsg" by a "state argument" (same > principle) which is set in the same lock section as the state > update, and retrieved similarly. > > That way, there is never any discrepancy between the dequeued > state and the observed value of it. We keep the ability to have > the latest reconfig operation take precedence, but we guarantee > that once "dequeued" the argument (new_fsg) will not be clobbered > by any new event. > > Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> Yes, this looks just right. If I had thought about this a little more deeply earlier on, I would have come up with a patch very much like this. My only comments are cosmetic. > --- > drivers/usb/gadget/function/f_mass_storage.c | 26 ++++++++++++-------- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/drivers/usb/gadget/function/f_mass_storage.c b/drivers/usb/gadget/function/f_mass_storage.c > index 043f97ad8f22..2ef029413b01 100644 > --- a/drivers/usb/gadget/function/f_mass_storage.c > +++ b/drivers/usb/gadget/function/f_mass_storage.c > @@ -2285,16 +2292,14 @@ static int do_set_interface(struct fsg_common *common, struct fsg_dev *new_fsg) > static int fsg_set_alt(struct usb_function *f, unsigned intf, unsigned alt) > { > struct fsg_dev *fsg = fsg_from_func(f); While you're changing this, it would be nice to add the customary blank line here. > - fsg->common->new_fsg = fsg; > - raise_exception(fsg->common, FSG_STATE_CONFIG_CHANGE); > + __raise_exception(fsg->common, FSG_STATE_CONFIG_CHANGE, fsg); > return USB_GADGET_DELAYED_STATUS; > } > > static void fsg_disable(struct usb_function *f) > { > struct fsg_dev *fsg = fsg_from_func(f); And here. Otherwise: Acked-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> Alan Stern