Re: Disconnect race in Gadget core

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> writes:
>> Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> writes:
>> > I just noticed this potential race in the Gadget core, specifically, in 
>> > usb_gadget_remove_driver.  Here's the relevant code:
>> >
>> > 	usb_gadget_disconnect(udc->gadget);
>> > 	if (udc->gadget->irq)
>> > 		synchronize_irq(udc->gadget->irq);
>> > 	udc->driver->unbind(udc->gadget);
>> > 	usb_gadget_udc_stop(udc);
>> >
>> > usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or 
>> > upstream hub) that the gadget is no longer connected to the bus.  The 
>> > udc->driver->unbind call then tells the gadget driver that it will no 
>> > longer receive any callbacks from the UDC driver or the gadget core.
>> >
>> > Now suppose that at just this moment, the user unplugs the USB cable.  
>> > The UDC driver will notice that the Vbus voltage has gone away and will 
>> > invoke the gadget driver's ->disconnect callback.  After all, it doesn't 
>> > realize it should avoid making these callbacks until usb_gadget_udc_stop 
>> > has run.
>> >
>> > As a result, the gadget driver's disconnect callback may be invoked 
>> > _after_ the driver has been unbound from the gadget.
>> 
>> argh, nice catch!
>> 
>> > How should we fix this potential problem?
>> 
>> I think we can just move usb_gadget_udc_stop(). Looking at a few udcs,
>> they're just either freeing or masking UDC's interrupts which should
>> prevent the race you describe while also making sure that no further
>> interrupts will trigger.
>> 
>> Perhaps we could move udc_stop() before synchronize_irq(). Do you
>> foresee any issues with that for net2272 or dummy?
>
> I don't think it will be that easy.  As you may remember, there have 
> been a number of commits over the years fiddling with the order of 
> operations during gadget driver unbinding (although I don't think any of 
> them moved udc_stop from its position at the end).  Still, it's a 
> delicate matter.
>
> The purpose of synchronize_irq is to wait until any currently running 
> IRQ handlers have finished.  Obviously this doesn't do any good unless 
> the software has arranged beforehand that no new interrupt requests will 
> be generated.  In this case, turning off the pull-up is currently not 
> sufficient.  But waiting until after udc_stop has returned isn't the 
> answer; it wouldn't prevent callbacks from being issued between the 
> unbind and the udc_stop.
>
> And I'm quite sure that calling udc_stop before unbind would be wrong.  
> The gadget driver would then get various errors (like requests 
> completing with -ESHUTDOWN status) it's not prepared to handle.
>
> So what we need is a way to tell the UDC driver to stop issuing 
> callbacks.  The ones defined in struct usb_gadget_driver are:
>
> 	bind and unbind,
> 	bus suspend and bus resume,
> 	setup,
> 	bus reset,
> 	disconnect.
>
> Bind and unbind aren't relevant for this discussion because they are 
> synchronous, not invoked in response to interrupts.
>
> In theory there shouldn't be any bus-resume, setup, or bus-reset 
> interrupts once the pull-up is turned off, but it would be good to 
> protect against bad hardware which may produce them.
>
> Bus suspend is a real possibility.  It occurs when the UDC hasn't 
> received any packets for a few milliseconds, which certainly will be the 
> case when the pull-up is off.  UDC hardware and drivers may or may not 
> be smart enough to realize that under this circumstance, lack of packets 
> shouldn't be reported as a bus suspend.
>
> And of course, a cable disconnect can occur at any time -- nothing we 
> can do about that.
>
> Putting it all together, we need to tell the UDC driver, somewhere 
> between turning the pull-up off and doing the synchronize_irq, to stop 
> issuing disconnect (and possibly also setup and suspend) callbacks.  I 
> don't think we can use the pull-up call for this purpose; a gadget 
> driver may want for some reason to disconnect logically from the bus 
> while still knowing whether Vbus is present.  (You may disagree about 
> that, but otherwise what's the point of having a disconnect callback in 
> the first place?)
>
> Which means in the end that struct usb_gadget_ops needs either to have a 
> new callback for this purpose or to have an existing callback augmented 
> (for example, the ->pullup callback could get an additional argument 
> saying whether to continue issuing callbacks).  Or another possibility 
> is to make UDC drivers call a core routine to do disconnect callbacks 
> instead of issuing those callbacks themselves, and have the core filter 
> out callbacks that come at the wrong time.  Either way, every UDC driver 
> would need to be modified.
>
> What do you think?  Is this reasoning accurate, or have I missed 
> something?

Right, I'm arguing that, perhaps, ->udc_stop() is the one that should
have said semantics. For starters, 'stop' has a very clear meaning and,
considering my quick review of 3 or 4 UDC drivers, they are just masking
or releasing interrupts which would prevent ->suspend() and
->disconnect() from being called. It could be, however, that if we
change the semantics of udc_stop to fit your description above,
->udc_start() may have to change accordingly. Using dwc3 as an example,
here are the relevant implementations:

> static int dwc3_gadget_start(struct usb_gadget *g,
> 		struct usb_gadget_driver *driver)
> {
> 	struct dwc3		*dwc = gadget_to_dwc(g);
> 	unsigned long		flags;
> 	int			ret;
> 	int			irq;
>
> 	irq = dwc->irq_gadget;
> 	ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt,
> 			IRQF_SHARED, "dwc3", dwc->ev_buf);

request interrupt line and enable it. Prepare the udc to call gadget ops.

> 	if (ret) {
> 		dev_err(dwc->dev, "failed to request irq #%d --> %d\n",
> 				irq, ret);
> 		return ret;
> 	}
>
> 	spin_lock_irqsave(&dwc->lock, flags);
> 	dwc->gadget_driver	= driver;

internal pointer cached for convenience

> 	spin_unlock_irqrestore(&dwc->lock, flags);
>
> 	return 0;
> }
>
> static int dwc3_gadget_stop(struct usb_gadget *g)
> {
> 	struct dwc3		*dwc = gadget_to_dwc(g);
> 	unsigned long		flags;
>
> 	spin_lock_irqsave(&dwc->lock, flags);
> 	dwc->gadget_driver	= NULL;
> 	spin_unlock_irqrestore(&dwc->lock, flags);
>
> 	free_irq(dwc->irq_gadget, dwc->ev_buf);

drop the interrupt line. This makes the synchronize_irq() call
irrelevant in usb_gadget_remove_driver().

I'm not against adding new udc methods to gadget ops, but it seems
fitting that udc_start/udc_stop would fit your description while some
new members could be given the task of priming the default control pipe
to receive the first SETUP packet.

What do you think?

-- 
balbi

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux