Hi, Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes: > The list_for_each_entry_safe() macro saves the current item (n) and > the item after (n+1), so that n can be safely removed without > corrupting the list. However, when traversing the list and removing > items using gadget giveback, the DWC3 lock is briefly released, > allowing other routines to execute. There is a situation where, while > items are being removed from the cancelled_list using > dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable > routine is running in parallel (due to UDC unbind). As the cleanup > routine removes n, and the pullup disable removes n+1, once the > cleanup retakes the DWC3 lock, it references a request who was already > removed/handled. With list debug enabled, this leads to a panic. > Ensure all instances of the macro are replaced where gadget giveback > is used. > > Example call stack: > > Thread#1: > __dwc3_gadget_ep_set_halt() - CLEAR HALT > -> dwc3_gadget_ep_cleanup_cancelled_requests() > ->list_for_each_entry_safe() > ->dwc3_gadget_giveback(n) > ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] > ->spin_unlock > ->Thread#2 executes > ... > ->dwc3_gadget_giveback(n+1) > ->Already removed! > > Thread#2: > dwc3_gadget_pullup() > ->waiting for dwc3 spin_lock > ... > ->Thread#1 released lock > ->dwc3_stop_active_transfers() > ->dwc3_remove_requests() > ->fetches n+1 item from cancelled_list (n removed by Thread#1) > ->dwc3_gadget_giveback() > ->dwc3_gadget_del_and_unmap_request()- n+1 > deleted[cancelled_list] > ->spin_unlock > > Fix this condition by utilizing list_replace_init(), and traversing > through a local copy of the current elements in the endpoint lists. > This will also set the parent list as empty, so if another thread is > also looping through the list, it will be empty on the next iteration. > > Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") > Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx> > > --- > Previous patchset: > https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@xxxxxxxxxxxxxx/ > --- > drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index a29a4ca..3ce6ed9 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) > { > struct dwc3_request *req; > struct dwc3_request *tmp; > + struct list_head local; > struct dwc3 *dwc = dep->dwc; > > - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { > +restart: > + list_replace_init(&dep->cancelled_list, &local); hmm, if the lock is held and IRQs disabled when this runs, then no other threads will be able to append requests to the list which makes the "restart" label unnecessary, no? I wonder if we should release the lock and reenable interrupts after replacing the head. The problem is that dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ handler. Alan, could you provide your insight here? Do you think we should defer this to a low priority tasklet or something along those lines? > + list_for_each_entry_safe(req, tmp, &local, list) { > dwc3_gadget_ep_skip_trbs(dep, req); > switch (req->status) { > case DWC3_REQUEST_STATUS_DISCONNECTED: -- balbi