Re: Disconnect race in Gadget core

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> writes:
> On Tue, May 11, 2021 at 11:22:51AM +0300, Felipe Balbi wrote:
>> Right, I'm arguing that, perhaps, ->udc_stop() is the one that should
>> have said semantics. For starters, 'stop' has a very clear meaning and,
>> considering my quick review of 3 or 4 UDC drivers, they are just masking
>> or releasing interrupts which would prevent ->suspend() and
>> ->disconnect() from being called. It could be, however, that if we
>> change the semantics of udc_stop to fit your description above,
>> ->udc_start() may have to change accordingly. Using dwc3 as an example,
>> here are the relevant implementations:
>> 
>> > static int dwc3_gadget_start(struct usb_gadget *g,
>> > 		struct usb_gadget_driver *driver)
>> > {
>> > 	struct dwc3		*dwc = gadget_to_dwc(g);
>> > 	unsigned long		flags;
>> > 	int			ret;
>> > 	int			irq;
>> >
>> > 	irq = dwc->irq_gadget;
>> > 	ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt,
>> > 			IRQF_SHARED, "dwc3", dwc->ev_buf);
>> 
>> request interrupt line and enable it. Prepare the udc to call gadget ops.
>> 
>> > 	if (ret) {
>> > 		dev_err(dwc->dev, "failed to request irq #%d --> %d\n",
>> > 				irq, ret);
>> > 		return ret;
>> > 	}
>> >
>> > 	spin_lock_irqsave(&dwc->lock, flags);
>> > 	dwc->gadget_driver	= driver;
>> 
>> internal pointer cached for convenience
>> 
>> > 	spin_unlock_irqrestore(&dwc->lock, flags);
>> >
>> > 	return 0;
>> > }
>> >
>> > static int dwc3_gadget_stop(struct usb_gadget *g)
>> > {
>> > 	struct dwc3		*dwc = gadget_to_dwc(g);
>> > 	unsigned long		flags;
>> >
>> > 	spin_lock_irqsave(&dwc->lock, flags);
>> > 	dwc->gadget_driver	= NULL;
>> > 	spin_unlock_irqrestore(&dwc->lock, flags);
>> >
>> > 	free_irq(dwc->irq_gadget, dwc->ev_buf);
>> 
>> drop the interrupt line. This makes the synchronize_irq() call
>> irrelevant in usb_gadget_remove_driver().
>
> I'm not so sure about this.  It seems like this whole thing arose when 
> the UDC core was created.  Before that, gadget drivers would register 

yes, that's correct.

> and unregister themselves by calling routines in the UDC driver (because 
> there was no core to manage things overall).  Thus the UDC driver knew 

right, before that we also didn't have platforms with more than one
UDC. To be frank, though, that was never really true, considering we
could order two net2272 PCI cards and stick them in the same PC.

> everything that was going on and could arrange to do things in the right 
> order.

right

> But now the UDC driver doesn't know about unregistrations/unbinding 
> until too late.

Some of this was changed recently, though. That was to cope with
USB-IF's mandate that pull-ups shouldn't be connected until VBUS is
above VBUS_VALID_THRESHOLD (v4.4). Some controllers, such as dwc3,
manage that internally (as far as I remember, but I see similar
constructs in dwc3 now) while others had to modify udc_start to cope
with this situation.

> So in dwc3, for example: At what point do you abort all outstanding 
> requests with -ESHUTDOWN status?  We don't want to do this before 

we do this as part of dwc3_remove_requests(). So, it's done either when
the relevant endpoint is disabled or as part of
dwc3_stop_active_transfers() which in turn is called from a (bus) reset
interrupt or when disconnecting pullups.

> invoking the gadget driver's ->unbind callback.  Or do you rely on the 
> gadget driver to cancel the oustanding requests by itself?
>
> (In dummy-hcd, the udc_stop routine first calls stop_activity, which 
> nukes all outstanding requests, and afterward sets dum->driver to NULL.)

I see.

> The host-side API, which I admit may not be the greatest, does cancel 
> all outstanding URBs before calling the class driver's disconnect 
> routine -- unless the class driver sets the "soft_unbind" flag, in which 
> case we assume the driver will kill its own URBs properly.
>
> Suppose dwc3_gadget_stop was moved before the ->unbind callback.  Then 
> when the gadget driver cancelled its outstanding requests during unbind, 
> how could dwc3 do the completion callbacks with dwc->gadget_driver 
> already set to NULL?

That's fair :-)

>> I'm not against adding new udc methods to gadget ops, but it seems
>> fitting that udc_start/udc_stop would fit your description while some
>> new members could be given the task of priming the default control pipe
>> to receive the first SETUP packet.
>> 
>> What do you think?
>
> Starting things up when a new gadget driver binds doesn't seem to be so 
> much of a problem.  After all, the new driver isn't going to do anything 
> before the first SETUP packet arrives, since the gadget will be 

it could be an impact in power consumption, albeit minimal

> unconfigured.  Unbinding and shutting down are the hard parts.
>
> I guess the ideal approach would be:
>
> 	First, the UDC driver basically turns off the UDC hardware.
> 	This means no more IRQs will be generated.  But pending requests
> 	remain pending until they are explicitly cancelled.

right, that, I argue, is the responsibility of ->udc_stop()

> 	Second, the gadget driver's unbind callback runs.  It should
> 	cancel any outstanding requests and generally release resources.

correct. But that means we would require the gadget driver to initiate
cancelling of outstanding requests

> 	Third, the UDC driver WARNs about any requests that still exist
> 	and automatically releases them without doing any completion
> 	callbacks.  It also forgets about the gadget driver (this can't
> 	happen until after the gadget driver has cancelled its 
> 	requests).
>
> Right now we are doing the first two steps in the opposite order.  That 
> would be okay, provided we could guarantee there are no more 
> asynchronous callbacks once unbind is called (sort of like what Peter 
> has done for configfs).  But it would be better to do the steps in the 
> order shown above.  This does correspond to calling udc_stop first, as 
> you suggest.

right

> But it also would mean splitting out the third step as something 
> separate from udc_stop.  Not to mention some potentially serious 
> updating of some UDC drivers.

yeah, it would take quite a bit of effort.

> On the other hand, there is something to be said for leaving the UDC 
> operational until after the unbind callback.  If the gadget driver 
> happened to be installing a new alternate setting at that time, say in a 
> workqueue thread, calls to activate new endpoints wouldn't suddenly get 
> unexpected errors.

Hmm, IIRC only the storage gadget defers work to another thread. What
you describe can also happen today depending on how far into the future
the kthread is scheduled, no? So, how does storage gadget behave with
that today?

-- 
balbi

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux