Hi, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> writes: >> Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> writes: >> > I just noticed this potential race in the Gadget core, specifically, in >> > usb_gadget_remove_driver. Here's the relevant code: >> > >> > usb_gadget_disconnect(udc->gadget); >> > if (udc->gadget->irq) >> > synchronize_irq(udc->gadget->irq); >> > udc->driver->unbind(udc->gadget); >> > usb_gadget_udc_stop(udc); >> > >> > usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or >> > upstream hub) that the gadget is no longer connected to the bus. The >> > udc->driver->unbind call then tells the gadget driver that it will no >> > longer receive any callbacks from the UDC driver or the gadget core. >> > >> > Now suppose that at just this moment, the user unplugs the USB cable. >> > The UDC driver will notice that the Vbus voltage has gone away and will >> > invoke the gadget driver's ->disconnect callback. After all, it doesn't >> > realize it should avoid making these callbacks until usb_gadget_udc_stop >> > has run. >> > >> > As a result, the gadget driver's disconnect callback may be invoked >> > _after_ the driver has been unbound from the gadget. >> >> argh, nice catch! >> >> > How should we fix this potential problem? >> >> I think we can just move usb_gadget_udc_stop(). Looking at a few udcs, >> they're just either freeing or masking UDC's interrupts which should >> prevent the race you describe while also making sure that no further >> interrupts will trigger. >> >> Perhaps we could move udc_stop() before synchronize_irq(). Do you >> foresee any issues with that for net2272 or dummy? > > I don't think it will be that easy. As you may remember, there have > been a number of commits over the years fiddling with the order of > operations during gadget driver unbinding (although I don't think any of > them moved udc_stop from its position at the end). Still, it's a > delicate matter. > > The purpose of synchronize_irq is to wait until any currently running > IRQ handlers have finished. Obviously this doesn't do any good unless > the software has arranged beforehand that no new interrupt requests will > be generated. In this case, turning off the pull-up is currently not > sufficient. But waiting until after udc_stop has returned isn't the > answer; it wouldn't prevent callbacks from being issued between the > unbind and the udc_stop. > > And I'm quite sure that calling udc_stop before unbind would be wrong. > The gadget driver would then get various errors (like requests > completing with -ESHUTDOWN status) it's not prepared to handle. > > So what we need is a way to tell the UDC driver to stop issuing > callbacks. The ones defined in struct usb_gadget_driver are: > > bind and unbind, > bus suspend and bus resume, > setup, > bus reset, > disconnect. > > Bind and unbind aren't relevant for this discussion because they are > synchronous, not invoked in response to interrupts. > > In theory there shouldn't be any bus-resume, setup, or bus-reset > interrupts once the pull-up is turned off, but it would be good to > protect against bad hardware which may produce them. > > Bus suspend is a real possibility. It occurs when the UDC hasn't > received any packets for a few milliseconds, which certainly will be the > case when the pull-up is off. UDC hardware and drivers may or may not > be smart enough to realize that under this circumstance, lack of packets > shouldn't be reported as a bus suspend. > > And of course, a cable disconnect can occur at any time -- nothing we > can do about that. > > Putting it all together, we need to tell the UDC driver, somewhere > between turning the pull-up off and doing the synchronize_irq, to stop > issuing disconnect (and possibly also setup and suspend) callbacks. I > don't think we can use the pull-up call for this purpose; a gadget > driver may want for some reason to disconnect logically from the bus > while still knowing whether Vbus is present. (You may disagree about > that, but otherwise what's the point of having a disconnect callback in > the first place?) > > Which means in the end that struct usb_gadget_ops needs either to have a > new callback for this purpose or to have an existing callback augmented > (for example, the ->pullup callback could get an additional argument > saying whether to continue issuing callbacks). Or another possibility > is to make UDC drivers call a core routine to do disconnect callbacks > instead of issuing those callbacks themselves, and have the core filter > out callbacks that come at the wrong time. Either way, every UDC driver > would need to be modified. > > What do you think? Is this reasoning accurate, or have I missed > something? Right, I'm arguing that, perhaps, ->udc_stop() is the one that should have said semantics. For starters, 'stop' has a very clear meaning and, considering my quick review of 3 or 4 UDC drivers, they are just masking or releasing interrupts which would prevent ->suspend() and ->disconnect() from being called. It could be, however, that if we change the semantics of udc_stop to fit your description above, ->udc_start() may have to change accordingly. Using dwc3 as an example, here are the relevant implementations: > static int dwc3_gadget_start(struct usb_gadget *g, > struct usb_gadget_driver *driver) > { > struct dwc3 *dwc = gadget_to_dwc(g); > unsigned long flags; > int ret; > int irq; > > irq = dwc->irq_gadget; > ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt, > IRQF_SHARED, "dwc3", dwc->ev_buf); request interrupt line and enable it. Prepare the udc to call gadget ops. > if (ret) { > dev_err(dwc->dev, "failed to request irq #%d --> %d\n", > irq, ret); > return ret; > } > > spin_lock_irqsave(&dwc->lock, flags); > dwc->gadget_driver = driver; internal pointer cached for convenience > spin_unlock_irqrestore(&dwc->lock, flags); > > return 0; > } > > static int dwc3_gadget_stop(struct usb_gadget *g) > { > struct dwc3 *dwc = gadget_to_dwc(g); > unsigned long flags; > > spin_lock_irqsave(&dwc->lock, flags); > dwc->gadget_driver = NULL; > spin_unlock_irqrestore(&dwc->lock, flags); > > free_irq(dwc->irq_gadget, dwc->ev_buf); drop the interrupt line. This makes the synchronize_irq() call irrelevant in usb_gadget_remove_driver(). I'm not against adding new udc methods to gadget ops, but it seems fitting that udc_start/udc_stop would fit your description while some new members could be given the task of priming the default control pipe to receive the first SETUP packet. What do you think? -- balbi
Attachment:
signature.asc
Description: PGP signature