Hi, Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes: >>> The list_for_each_entry_safe() macro saves the current item (n) and >>> the item after (n+1), so that n can be safely removed without >>> corrupting the list. However, when traversing the list and removing >>> items using gadget giveback, the DWC3 lock is briefly released, >>> allowing other routines to execute. There is a situation where, while >>> items are being removed from the cancelled_list using >>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable >>> routine is running in parallel (due to UDC unbind). As the cleanup >>> routine removes n, and the pullup disable removes n+1, once the >>> cleanup retakes the DWC3 lock, it references a request who was already >>> removed/handled. With list debug enabled, this leads to a panic. >>> Ensure all instances of the macro are replaced where gadget giveback >>> is used. >>> >>> Example call stack: >>> >>> Thread#1: >>> __dwc3_gadget_ep_set_halt() - CLEAR HALT >>> -> dwc3_gadget_ep_cleanup_cancelled_requests() >>> ->list_for_each_entry_safe() >>> ->dwc3_gadget_giveback(n) >>> ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] >>> ->spin_unlock >>> ->Thread#2 executes >>> ... >>> ->dwc3_gadget_giveback(n+1) >>> ->Already removed! >>> >>> Thread#2: >>> dwc3_gadget_pullup() >>> ->waiting for dwc3 spin_lock >>> ... >>> ->Thread#1 released lock >>> ->dwc3_stop_active_transfers() >>> ->dwc3_remove_requests() >>> ->fetches n+1 item from cancelled_list (n removed by Thread#1) >>> ->dwc3_gadget_giveback() >>> ->dwc3_gadget_del_and_unmap_request()- n+1 >>> deleted[cancelled_list] >>> ->spin_unlock >>> >>> Fix this condition by utilizing list_replace_init(), and traversing >>> through a local copy of the current elements in the endpoint lists. >>> This will also set the parent list as empty, so if another thread is >>> also looping through the list, it will be empty on the next iteration. >>> >>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") >>> Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx> >>> >>> --- >>> Previous patchset: >>> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@xxxxxxxxxxxxxx/ >>> --- >>> drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++-- >>> 1 file changed, 16 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>> index a29a4ca..3ce6ed9 100644 >>> --- a/drivers/usb/dwc3/gadget.c >>> +++ b/drivers/usb/dwc3/gadget.c >>> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) >>> { >>> struct dwc3_request *req; >>> struct dwc3_request *tmp; >>> + struct list_head local; >>> struct dwc3 *dwc = dep->dwc; >>> >>> - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { >>> +restart: >>> + list_replace_init(&dep->cancelled_list, &local); >> >> hmm, if the lock is held and IRQs disabled when this runs, then no other >> threads will be able to append requests to the list which makes the >> "restart" label unnecessary, no? > > We do still call dwc3_gadget_giveback() which would release the lock > briefly, so if there was another thread waiting on dwc->lock, it would > be able to add additional items to that list. > >> >> I wonder if we should release the lock and reenable interrupts after >> replacing the head. The problem is that >> dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ >> handler. >> > > We would also need to consider that some of the APIs being called in > these situations would also have the assumption that the dwc->lock is > held, ie dwc3_gadget_giveback() yeah, good point. I think we're good to integrate this, unless Alan can shed some light on some particular possible race scenario we may have missed. In any case: Acked-by: Felipe Balbi <balbi@xxxxxxxxxx> -- balbi