On Sat, Jul 22, 2023 at 8:57 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, Jul 21, 2023 at 03:44:52PM -0700, Avichal Rakesh wrote: > > Thank you both, Dan and Alan, for your comments! > > > > On Fri, Jul 21, 2023 at 12:32 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Fri, Jul 21, 2023 at 04:57:51PM +0100, Dan Scally wrote: > > > > Hi Avichal - thanks for all the detail > > > > > > > > > A dirty Patch for option 2 is attached below which skips calling > > > > > usb_function_deactivate if uvc_function_disable was called before. It seems > > > > > to work okay in testing. Let me know if the analysis and solutions seems okay > > > > > and I can upload a formal patch. > > > > > > > > > > > > For what it's worth the analysis makes sense; the patch looks ok to me so if > > > > the conclusion is to fix the problem that way I think it's fine, but I'm > > > > more inclined to consider this a locking problem in core - it'd be better to > > > > fix it there I think. > > > > > > I'm not so sure that handling this in the core is feasible. Removing > > > the driver obviously needs to be synchronized with deactivation, since > > > the two actions affect the same parts of the state (i.e., the pull-ups > > > and the "connected" flag). > > > > I don't have the full context on what caused the locking to be added, > > but now that it > > in place, it seems like there needs to be a clarification of > > expectation between core > > and the gadget drivers. Is it valid for the gadget drivers to call > > usb_gadget_deactivate (and similar functions) as a part of disable/unbind > > (in terms of API/expectations)? > > > > 1. If yes, maybe core can track when it is in the middle of resetting and > > drop calls to usb_gadget_deactivate if called in the middle of the > > disconnect--->unbind sequence. This is effectively what the patch above > > does in UVC driver, but core might (with some extra state) have stronger > > guarantees of when a call is redundant and can be safely dropped. > > > > 2. If no, then it becomes the gadget's responsibility to ensure that it doesn't > > call any of the usb_gadget_* functions when disabling/unbinding. However, it > > does require core to provide some concrete rules around when things are safe > > to call, and when they aren't. > > > > > > > > Consequently I don't see how to avoid a deadlock if the driver's unbind > > > callback does a deactivate. Besides, as the patch mentions, doing so is > > > never necessary. > > > > > > However, even with that call removed we could still have a problem. I > > > don't know much about how the UVC function driver works, but it would be > > > reasonable for the driver to have a private mutex that gets held both > > > during unbind and when the user application closes the V4L2 node. Then > > > there's an ABBA locking issue: > > > > > > Unbind: The UDC core holds connect_lock while calling the UVC > > > unbind handler, which needs to acquire the private mutex. > > > > > > Close node: The UVC driver holds the private mutex while doing > > > a deactivate, which needs to acquire connect_lock. > > > > > > Any ideas on how to clear this up? > > > > > > > I think my question above gives us two options out based on the answer: > > > > 1. Core handling redundant calls is the more bullet-proof solution IMO. It > > means that the gadget driver never holds connect_lock when it shouldn't. > > No more ABBA! > > > > One potential implementation is to track when core is resetting in a protected > > flag. All functions related to resetting/disconnecting would check the > > flag before > > locking connect_lock and would become no-ops if gadget is in the middle of > > resetting. > > > > 2. Some stronger guarantees will let the gadget driver's state machine decide > > if it can call usb_gadget_* functions. For example, if we can say for sure that > > disable call will always be followed by the unbind call, and that usb_gadget_* > > functions are disallowed between the two, then UVC driver can handle ABBA > > situation by tracking when it is between a disable and unbind call and skip > > calling usb_gadget_* function until unbind finishes. > > > > The downside of (2), is that a poorly written (or malicious) gadget driver can > > grind the gadget to a halt with a somewhat simple deadlock. > > > > Unfortunately, I am travelling over the next week, but I'll try to > > create and attach > > a dirty patch for core to handle redundant calls to usb_gadget_* over the next > > week. > > > > I am fairly new and don't know the full semantics around core, so if I > > am missing > > something simple here, please do let me know! > > Here's a proposal, along the lines of your first suggestion above. The > idea is to avoid holding the connect_lock mutex while invoking a gadget > driver's callbacks. > > Unfortunately, this is unavoidable in the case of the ->disconnect() > callback, but that's okay because the kerneldoc already says that > ->disconnect() may be called in_interrupt, so it's not allowed to call > any core routines that may sleep. Just to make this perfectly clear, > the patch adds a couple of comments along these lines. > > As far as I can tell, there is no real reason to hold connect_lock > during the ->unbind() callback. Not doing so should fix the problem in > the UVC function driver. Thank you for the patch (and apologies for the delay)! This is a neat fix I completely glossed over. Looked around at other places where unbind is called, and it looks like the lock isn't held anywhere else either, so dropping the lock before calling unbind shouldn't break any existing assumptions around the callback. > > Let me know if this works. > > > drivers/usb/gadget/udc/core.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > Index: usb-devel/drivers/usb/gadget/udc/core.c > =================================================================== > --- usb-devel.orig/drivers/usb/gadget/udc/core.c > +++ usb-devel/drivers/usb/gadget/udc/core.c > @@ -822,6 +822,9 @@ EXPORT_SYMBOL_GPL(usb_gadget_disconnect) > * usb_gadget_activate() is called. For example, user mode components may > * need to be activated before the system can talk to hosts. > * > + * This routine may sleep; it must not be called in interrupt context > + * (such as from within a gadget driver's disconnect() callback). > + * > * Returns zero on success, else negative errno. > */ > int usb_gadget_deactivate(struct usb_gadget *gadget) > @@ -860,6 +863,8 @@ EXPORT_SYMBOL_GPL(usb_gadget_deactivate) > * This routine activates gadget which was previously deactivated with > * usb_gadget_deactivate() call. It calls usb_gadget_connect() if needed. > * > + * This routine may sleep; it must not be called in interrupt context. > + * > * Returns zero on success, else negative errno. > */ > int usb_gadget_activate(struct usb_gadget *gadget) > @@ -1639,7 +1644,11 @@ static void gadget_unbind_driver(struct > usb_gadget_disable_async_callbacks(udc); > if (gadget->irq) > synchronize_irq(gadget->irq); > + mutex_unlock(&udc->connect_lock); > + > udc->driver->unbind(gadget); > + > + mutex_lock(&udc->connect_lock); > usb_gadget_udc_stop_locked(udc); > mutex_unlock(&udc->connect_lock); > > Tried the patch, and it fixes the issue for UVC Gadget Driver! UVC driver can now be unbound without locking up with the V4L2 release callback. Logs confirm that the calls are still being interleaved, but don't result in a deadlock now. Regards, Avi.