Hi Michael, Apologies for the late reply, I have been out travelling. On Mon, Sep 4, 2023 at 10:36 PM Michael Grzeschik <mgr@xxxxxxxxxxxxxx> wrote: > > Hi Avichal > > Cc'ing: Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> > > On Wed, Aug 30, 2023 at 01:38:23PM -0700, Avichal Rakesh wrote: > >On 6/15/23 10:15, Laurent Pinchart wrote: > >> On Thu, Jun 15, 2023 at 10:34:06AM +0100, Daniel Scally wrote: > >>> Currently the UVC Gadget's .set_alt() callback disables the USB > >>> endpoint, following which a V4L2 event is queued that closes > >>> down the workqueue. This ordering results in repeated calls to > >>> usb_ep_queue() from the workqueue whilst usb_ep_disable() is > >>> running - behaviour that the documentation for usb_ep_disable() > >>> specifically prohibits. > >>> > >>> Move the call to usb_ep_disable() until after cancel_work_sync(), > >>> which will guarantee the endpoint is no longer in use when > >>> usb_ep_disable() is called. > >>> > >>> Signed-off-by: Daniel Scally <dan.scally@xxxxxxxxxxxxxxxx> > >>> --- > >>> drivers/usb/gadget/function/f_uvc.c | 3 --- > >>> drivers/usb/gadget/function/uvc_video.c | 4 ++++ > >>> 2 files changed, 4 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c > >>> index 5e919fb65833..4b91bd572a83 100644 > >>> --- a/drivers/usb/gadget/function/f_uvc.c > >>> +++ b/drivers/usb/gadget/function/f_uvc.c > >>> @@ -337,9 +337,6 @@ uvc_function_set_alt(struct usb_function *f, unsigned interface, unsigned alt) > >>> if (uvc->state != UVC_STATE_STREAMING) > >>> return 0; > >>> > >>> - if (uvc->video.ep) > >>> - usb_ep_disable(uvc->video.ep); > >>> - > >>> memset(&v4l2_event, 0, sizeof(v4l2_event)); > >>> v4l2_event.type = UVC_EVENT_STREAMOFF; > >>> v4l2_event_queue(&uvc->vdev, &v4l2_event); > >> > >> If we don't disable the endpoint here, should we return > >> USB_GADGET_DELAYED_STATUS here (and call uvc_function_setup_continue() > >> in uvc_v4l2_streamoff()) or is that not needed ? The uvc->state updated > >> should then possibly be moved to uvc_v4l2_streamoff() (checking if this > >> would cause any race condition would also be a nice bonus :-)). > >> > > > >Hey all, > > > >First off, apologies for reviving an old thread. We've also been seeing the > >"no resource for ep2in" warning from dwc3 controller, followed by the UVC gadget > >not streaming any frames, when there is a quick STREAMOFF->STREAMON sequence. > > > >It looks like this is the same root cause as what Dan mentioned in > >https://lore.kernel.org/20230531085544.253363-1-dan.scally@xxxxxxxxxxxxxxxx/ > >and this patch seems to solve. (Thank you Dan, for posting the stacktrace in > >that email thread! I had been banging my head for a couple of days before > >thinking of looking up the exact stack :|) > > > >IIUC, this stems from workqueue not fully shutting down by the time the usb > >endpoint is disabled and we need stronger guarantees that the workqueue pumping > >usb_requests doesn't accidentally queue usb_requests _after_ we've disabled > >the streaming endpoint on the usb controller. > > > >Attached is a patch that attempts to address the concerns raised here and sets > >up some stronger timing and memory guarantees. > > > >So here are the list of changes over what Dan had already started: > > > > - Return USB_GADGET_DELAYED_STATUS from set_alt. This is to ensure there are no > > more control transfers while we're waiting on the workqueue to wind down. > > > > - Move updating uvc->state to uvc_v4l2_streamoff as Laurent suggested. > > This ensures that setting uvc->state to STREAMING or CONNECTED > > happens as a result of streamoff and streamon events which makes things > > easier to reason about. > > > > - Guard video_pump with uvc->state so the thread can be stopped by setting > > uvc->state to anything other than UVC_STATE_STREAMING. This effectively makes > > uvc->state a flag for the video_pump loop to exit. This is the same flag that > > the complete callback uses to restart the workqueue, so all interactions > > with the controller are effectively guarded by the same flag now. > > > > - Set uvc->state to UVC_STATE_CONNECTED before winding down the work queue. > > Now that all usb logic is guarded by the same flag, setting the flag should > > stop all usb packet queuing once current execution finishes. > > > > - Add some memory guarantees in uvcg_video_enable(). At the moment, we don't > > really consider the fact that usb_ep_dequeue is async, which means that the > > usb_requests may not have been returned to the gadget before we start > > deallocating them. With this patch, we wait until all usb_requests have been > > returned to us before deallocating them (this adds a little bit of > > bookkeeping, but nothing too complicated). > > I am currently trying to solve that by preparing a patch that is > fixing the use of the requests when deallocating them. Since currently > the uvc_gadget is also running into wild use after free issues because > of exactly that async dequeue and dealloc situation. Do you already have a patch up for this? It seems my LKML-fu is failing and I can't seem to find the thread. If you aren't too deep into the patch, can you take a look at the request counting mechanism added in my patch? If you have a (somewhat) consistent repro of the use-after-dealloc issue, runnin it through the whole patch would be very appreciated! It is supposed to fix the exact problem you've described. > IMHO it should be > save to call dealloc after calling dequeue. Which is probably true for > other usb device controller driver other then dwc3. Perhaps Thinh or someone better versed in Gadget API can chime in on this, but as it stands usb_ep_dequeue specifically says that it is async, and gadget drivers must wait on the complete callbacks to regain ownership of the usb_request. Until the API change is made, UVC should adhere to the current API? > > For some background. The dwc3 is putting the requests into a cancelled list > that will be cleared by the interrupt handler and that is dequeuing them > instead. In between the dequeue call and the interrupt call the uvc layer could > dealloc the request which leads the interrupt handler to dequeue an > already freed request. This roughly tracks with what I gleaned from skimming the DWC3 code as well. In local tests the complete calls were always timely and I never actually ran into the situation where UVC deallocated an unowned request, but as someone (I think it was Alan?) said in a previous thread: technically possible just means it will happen eventually Please do review/test the patch. I'll send out a formal patch on Monday once I am back, but would love to have some early eyes take a look in case there is something obvious I missed. Thank you. - Avi