On 9/18/23 16:40, Avichal Rakesh wrote: > > > On 9/18/23 14:43, Michael Grzeschik wrote: >> On Mon, Sep 18, 2023 at 12:02:11PM -0700, Avichal Rakesh wrote: >>> On 9/16/23 16:23, Michael Grzeschik wrote: >>>> On Fri, Sep 15, 2023 at 07:41:05PM -0700, Avichal Rakesh wrote: >>>>> On 9/15/23 16:32, Michael Grzeschik wrote: >>>>>> On Mon, Sep 11, 2023 at 09:52:22PM -0700, Avichal Rakesh wrote: >>>>>>> On 9/10/23 17:24, Michael Grzeschik wrote: >>>>>>>> The uvc_video_enable function of the uvc-gadget driver is dequeing and >>>>>>>> immediately deallocs all requests on its disable codepath. This is not >>>>>>>> save since the dequeue function is async and does not ensure that the >>>>>>>> requests are left unlinked in the controller driver. >>>>>>>> >>>>>>>> ... >>> >>> I do apologize if I've missed something obvious with your changes that >>> prevents such interleaving. I don't currently see any locking or >>> other synchronization mechanism in your changes. Is there something >>> in dwc3 that prevents this situation? >> >> I think you have pointed it out totally clear. This is obviously the >> case. It just did not trigger here. But the window is there and has to >> be locked in some way. >> >> For now we have two options to solve it. >> >> 1) Trying to avoid this double code path of the complete callback and >> uvc_video_free_requests. This is what your patches are already doing. >> >> But for now I am not so pleased with the timeout concept by waiting for >> the complete interrupt to be called. This is also a shot in the dark as >> the latency depends on the scheduler and the amount of potential >> requests that are being handled. > > I agree, a timeout is not the most elegant of solutions and given a > weird enough scheduler, will run into issues as well . Thinking about this some more: I still agree that a timeout seems arbitrary and will (rightly) lead to questions along the lines of "If we can safely move on after 500ms, why not safely move on immediately?". However, to me it seems more reasonable to put an indefinite wait. The uvc gadget can wait forever for the complete callbacks to come through. This basically says that until the usb controller returns the usb_requests back to uvc gadget, it can't guarantee a consistent memory state as it is responsible for managing the usb_requests it has allocated. Of course there is no such thing as an indefinite wait, the watchdog will eventually kick in to reap the subsystem (potentially causing a kernel panic). But it seems reasonable to say that if the usb controller is unable to return the usb_requests in however long it takes the watchdog to bite, it has no business running uvc gadget anyway. My changes in https://lore.kernel.org/20230912041910.726442-2-arakesh@xxxxxxxxxx will ensure that no other control request comes in while uvc gadget is waiting, and it should be relatively trivial to update https://lore.kernel.org/20230912041910.726442-3-arakesh@xxxxxxxxxx to wait indefinitely. Laurent and Dan, does an indefinite wait seem more reasonable than an arbitrary wait, or would something like the suggestion in previous email be better? > >> >> 2) Locking both codepathes around the resource in question so the issue >> is avoided. >> >> However, I am also not a fried of many locks. >> >> Perhaps it is possible to use a combination of wait_for_completion in >> the uvc_video_free_requests and a complete callback in >> uvc_video_complete for those requests that are not listed in the >> req_free list. >> >> What do you think? >> > There might be a way that builds on your idea of cleaning up in the complete callback. > It would rely on having a uvc_requests that aren't bulk allocated, which may have a > performance impact. > > I am imagining something like the following: > 1. Instead of allocating a bulk of uvc_requests, we allocate them > one at a time and add them to uvc_video.ureq > 2. uvc_video.ureq becomes a list_head containing all the individual > requests > 3. We add a sentinel flag in uvc_request that says the request is > now stale. This flag is protected by uvc_video->req_lock > 4. uvc_video_complete looks at this flag to deallocate both > usb_request and uvc_request. > 5. uvcg_video_enable looks something like the following: > uvcg_video_enable(...) { > ... > lock(req_lock); > forall (uvc_requests->ureqs) {ureq->stale = true} > unlock(req_lock); > usb_ep_dequeue all reqs > > uvc_video_free_requests(...) > ... > } > 6. uvc_video_complete looks something like: > uvc_video_complete(...) { > // at the start > lock(req_lock) > is_stale = ureq->stale; > unlock(req_lock); > > if (is_stale) { > usb_ep_free_request(); > dealloc corresponding uvc_request(); > return; > } > > ... > > lock(req_lock); > // possible that request became stale while we were handling stuff > if (!ureq->stale) { > list_add_tail(&req->list, &video->req_free); > } else { > usb_ep_free_request(); > dealloc corresponding uvc_request(); > } > unlock(req_lock); > } > 7. uvc_video_free_requests can freely dealloc usb_requests/uvc_requests in > req_free because we can be certain that uvc_video_complete won't modify > it once requests have been marked stale, and the stale requests in flight > will be cleaned up by the complete callback. > > Effectively, we freeze the state of req_free before dequeuing, and all > inflight requests are considered the responsibility of the complete handler > from that point onwards. The gadget is only responsible for freeing requests it > currently owns. > > I think this should ensure that we never have a situation where the ownership of the > requests are undefined, and only one thread is responsible for freeing any given request. > > Hope that makes sense! > - Avi.