Hi Felipe, On 7/29/2021 1:09 AM, Felipe Balbi wrote: > > Hi, > > Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes: > >> The list_for_each_entry_safe() macro saves the current item (n) and >> the item after (n+1), so that n can be safely removed without >> corrupting the list. However, when traversing the list and removing >> items using gadget giveback, the DWC3 lock is briefly released, >> allowing other routines to execute. There is a situation where, while >> items are being removed from the cancelled_list using >> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable >> routine is running in parallel (due to UDC unbind). As the cleanup >> routine removes n, and the pullup disable removes n+1, once the >> cleanup retakes the DWC3 lock, it references a request who was already >> removed/handled. With list debug enabled, this leads to a panic. >> Ensure all instances of the macro are replaced where gadget giveback >> is used. >> >> Example call stack: >> >> Thread#1: >> __dwc3_gadget_ep_set_halt() - CLEAR HALT >> -> dwc3_gadget_ep_cleanup_cancelled_requests() >> ->list_for_each_entry_safe() >> ->dwc3_gadget_giveback(n) >> ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] >> ->spin_unlock >> ->Thread#2 executes >> ... >> ->dwc3_gadget_giveback(n+1) >> ->Already removed! >> >> Thread#2: >> dwc3_gadget_pullup() >> ->waiting for dwc3 spin_lock >> ... >> ->Thread#1 released lock >> ->dwc3_stop_active_transfers() >> ->dwc3_remove_requests() >> ->fetches n+1 item from cancelled_list (n removed by Thread#1) >> ->dwc3_gadget_giveback() >> ->dwc3_gadget_del_and_unmap_request()- n+1 >> deleted[cancelled_list] >> ->spin_unlock >> >> Fix this condition by utilizing list_replace_init(), and traversing >> through a local copy of the current elements in the endpoint lists. >> This will also set the parent list as empty, so if another thread is >> also looping through the list, it will be empty on the next iteration. >> >> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") >> Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx> >> >> --- >> Previous patchset: >> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@xxxxxxxxxxxxxx/ >> --- >> drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- >> 1 file changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >> index a29a4ca..3ce6ed9 100644 >> --- a/drivers/usb/dwc3/gadget.c >> +++ b/drivers/usb/dwc3/gadget.c >> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) >> { >> struct dwc3_request *req; >> struct dwc3_request *tmp; >> + struct list_head local; >> struct dwc3 *dwc = dep->dwc; >> >> - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { >> +restart: >> + list_replace_init(&dep->cancelled_list, &local); > > hmm, if the lock is held and IRQs disabled when this runs, then no other > threads will be able to append requests to the list which makes the > "restart" label unnecessary, no? We do still call dwc3_gadget_giveback() which would release the lock briefly, so if there was another thread waiting on dwc->lock, it would be able to add additional items to that list. > > I wonder if we should release the lock and reenable interrupts after > replacing the head. The problem is that > dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ > handler. > We would also need to consider that some of the APIs being called in these situations would also have the assumption that the dwc->lock is held, ie dwc3_gadget_giveback() Thanks Wesley Cheng > Alan, could you provide your insight here? Do you think we should defer > this to a low priority tasklet or something along those lines? > >> + list_for_each_entry_safe(req, tmp, &local, list) { >> dwc3_gadget_ep_skip_trbs(dep, req); >> switch (req->status) { >> case DWC3_REQUEST_STATUS_DISCONNECTED: > > -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project