On Fri, Jul 28, 2023 at 10:05:39AM -0400, Alan Stern wrote: > On Fri, Jul 28, 2023 at 01:57:15PM +0530, Avichal Rakesh wrote: > > On Sat, Jul 22, 2023 at 8:57 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > > > Here's a proposal, along the lines of your first suggestion above. The > > > idea is to avoid holding the connect_lock mutex while invoking a gadget > > > driver's callbacks. > > > > > > Unfortunately, this is unavoidable in the case of the ->disconnect() > > > callback, but that's okay because the kerneldoc already says that > > > ->disconnect() may be called in_interrupt, so it's not allowed to call > > > any core routines that may sleep. Just to make this perfectly clear, > > > the patch adds a couple of comments along these lines. > > > > > > As far as I can tell, there is no real reason to hold connect_lock > > > during the ->unbind() callback. Not doing so should fix the problem in > > > the UVC function driver. > > > > Thank you for the patch (and apologies for the delay)! This is a neat > > fix I completely glossed over. Looked around at other places where > > unbind is called, and it looks like the lock isn't held anywhere else > > either, so dropping the lock before calling unbind shouldn't break any > > existing assumptions around the callback. > > > Tried the patch, and it fixes the issue for UVC Gadget Driver! UVC > > driver can now be unbound without locking up with the V4L2 release > > callback. Logs confirm that the calls are still being interleaved, but > > don't result in a deadlock now. > > Thanks for trying it out. Is it okay for me to add your "Tested-by:" > tag when the patch is submitted? Another thing... It's not clear that the patch will fix the problem entirely. Your original analysis of the bug stated: > This means that attempting to unregister the UVC Gadget Driver results in the > V4L2 resource cleanup being stuck behind udc->connect_lock, which will only > be released after uvc_function_unbind finishes. This results in either the > gadget deactivating after the unbind process has finished, or in a Kernel Panic > as it tries to cleanup a V4L2 node that has been purged. My patch removes the locking issue. But if an execution path can occur with a lock present, it can also occur when the lock has been removed. That means it may still be possible for the UVC gadget driver to try deactivating the UDC after the unbind process has finished or for it to try cleaning up a V4L2 node that has been purged. If either of those really could have happened (as opposed to just getting stuck in a deadlock, waiting for a mutex that would never be released), then it can still happen with the patch. Fixing them would require changes to the UVC gadget driver. So the problem may not be gone entirely. Alan Stern