Re: Disconnect race in Gadget core

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 10, 2021 at 07:43:00PM +0300, Felipe Balbi wrote:
> 
> Hi Alan,
> 
> Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> writes:
> > I just noticed this potential race in the Gadget core, specifically, in 
> > usb_gadget_remove_driver.  Here's the relevant code:
> >
> > 	usb_gadget_disconnect(udc->gadget);
> > 	if (udc->gadget->irq)
> > 		synchronize_irq(udc->gadget->irq);
> > 	udc->driver->unbind(udc->gadget);
> > 	usb_gadget_udc_stop(udc);
> >
> > usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or 
> > upstream hub) that the gadget is no longer connected to the bus.  The 
> > udc->driver->unbind call then tells the gadget driver that it will no 
> > longer receive any callbacks from the UDC driver or the gadget core.
> >
> > Now suppose that at just this moment, the user unplugs the USB cable.  
> > The UDC driver will notice that the Vbus voltage has gone away and will 
> > invoke the gadget driver's ->disconnect callback.  After all, it doesn't 
> > realize it should avoid making these callbacks until usb_gadget_udc_stop 
> > has run.
> >
> > As a result, the gadget driver's disconnect callback may be invoked 
> > _after_ the driver has been unbound from the gadget.
> 
> argh, nice catch!
> 
> > How should we fix this potential problem?
> 
> I think we can just move usb_gadget_udc_stop(). Looking at a few udcs,
> they're just either freeing or masking UDC's interrupts which should
> prevent the race you describe while also making sure that no further
> interrupts will trigger.
> 
> Perhaps we could move udc_stop() before synchronize_irq(). Do you
> foresee any issues with that for net2272 or dummy?

I don't think it will be that easy.  As you may remember, there have 
been a number of commits over the years fiddling with the order of 
operations during gadget driver unbinding (although I don't think any of 
them moved udc_stop from its position at the end).  Still, it's a 
delicate matter.

The purpose of synchronize_irq is to wait until any currently running 
IRQ handlers have finished.  Obviously this doesn't do any good unless 
the software has arranged beforehand that no new interrupt requests will 
be generated.  In this case, turning off the pull-up is currently not 
sufficient.  But waiting until after udc_stop has returned isn't the 
answer; it wouldn't prevent callbacks from being issued between the 
unbind and the udc_stop.

And I'm quite sure that calling udc_stop before unbind would be wrong.  
The gadget driver would then get various errors (like requests 
completing with -ESHUTDOWN status) it's not prepared to handle.

So what we need is a way to tell the UDC driver to stop issuing 
callbacks.  The ones defined in struct usb_gadget_driver are:

	bind and unbind,
	bus suspend and bus resume,
	setup,
	bus reset,
	disconnect.

Bind and unbind aren't relevant for this discussion because they are 
synchronous, not invoked in response to interrupts.

In theory there shouldn't be any bus-resume, setup, or bus-reset 
interrupts once the pull-up is turned off, but it would be good to 
protect against bad hardware which may produce them.

Bus suspend is a real possibility.  It occurs when the UDC hasn't 
received any packets for a few milliseconds, which certainly will be the 
case when the pull-up is off.  UDC hardware and drivers may or may not 
be smart enough to realize that under this circumstance, lack of packets 
shouldn't be reported as a bus suspend.

And of course, a cable disconnect can occur at any time -- nothing we 
can do about that.

Putting it all together, we need to tell the UDC driver, somewhere 
between turning the pull-up off and doing the synchronize_irq, to stop 
issuing disconnect (and possibly also setup and suspend) callbacks.  I 
don't think we can use the pull-up call for this purpose; a gadget 
driver may want for some reason to disconnect logically from the bus 
while still knowing whether Vbus is present.  (You may disagree about 
that, but otherwise what's the point of having a disconnect callback in 
the first place?)

Which means in the end that struct usb_gadget_ops needs either to have a 
new callback for this purpose or to have an existing callback augmented 
(for example, the ->pullup callback could get an additional argument 
saying whether to continue issuing callbacks).  Or another possibility 
is to make UDC drivers call a core routine to do disconnect callbacks 
instead of issuing those callbacks themselves, and have the core filter 
out callbacks that come at the wrong time.  Either way, every UDC driver 
would need to be modified.

What do you think?  Is this reasoning accurate, or have I missed 
something?

Alan Stern



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux