Hi Wesley, (first of all, sorry for the super long delay. This really fell through the cracks) Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes: > Hi Felipe, > > On 6/9/2021 1:57 PM, Wesley Cheng wrote: >> Hi Felipe, >> >> On 5/19/2021 1:52 AM, Wesley Cheng wrote: >>> >>> >>> On 5/11/2021 1:13 AM, Felipe Balbi wrote: >>>> >>>> Hi, >>>> >>>> Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes: >>>>> The list_for_each_entry_safe() macro saves the current item (n) and >>>>> the item after (n+1), so that n can be safely removed without >>>>> corrupting the list. However, when traversing the list and removing >>>>> items using gadget giveback, the DWC3 lock is briefly released, >>>>> allowing other routines to execute. There is a situation where, while >>>>> items are being removed from the cancelled_list using >>>>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable >>>>> routine is running in parallel (due to UDC unbind). As the cleanup >>>>> routine removes n, and the pullup disable removes n+1, once the >>>>> cleanup retakes the DWC3 lock, it references a request who was already >>>>> removed/handled. With list debug enabled, this leads to a panic. >>>>> Ensure all instances of the macro are replaced where gadget giveback >>>>> is used. >>>>> >>>>> Example call stack: >>>>> >>>>> Thread#1: >>>>> __dwc3_gadget_ep_set_halt() - CLEAR HALT >>>>> -> dwc3_gadget_ep_cleanup_cancelled_requests() >>>>> ->list_for_each_entry_safe() >>>>> ->dwc3_gadget_giveback(n) >>>>> ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] >>>>> ->spin_unlock >>>>> ->Thread#2 executes >>>>> ... >>>>> ->dwc3_gadget_giveback(n+1) >>>>> ->Already removed! >>>>> >>>>> Thread#2: >>>>> dwc3_gadget_pullup() >>>>> ->waiting for dwc3 spin_lock >>>>> ... >>>>> ->Thread#1 released lock >>>>> ->dwc3_stop_active_transfers() >>>>> ->dwc3_remove_requests() >>>>> ->fetches n+1 item from cancelled_list (n removed by Thread#1) >>>>> ->dwc3_gadget_giveback() >>>>> ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list] >>>>> ->spin_unlock >>>>> >>>>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") >>>>> Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx> >>>>> Reviewed-by: Peter Chen <peter.chen@xxxxxxxxxx> >>>>> --- >>>>> Changes in v2: >>>>> - Updated commit message with context call stack of an example scenario >>>>> seen on device. >>>>> >>>>> drivers/usb/dwc3/gadget.c | 8 ++++---- >>>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>>>> index dd80e5c..efa939b 100644 >>>>> --- a/drivers/usb/dwc3/gadget.c >>>>> +++ b/drivers/usb/dwc3/gadget.c >>>>> @@ -1737,10 +1737,10 @@ static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *r >>>>> static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) >>>>> { >>>>> struct dwc3_request *req; >>>>> - struct dwc3_request *tmp; >>>>> struct dwc3 *dwc = dep->dwc; >>>>> >>>>> - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { >>>>> + while (!list_empty(&dep->cancelled_list)) { >>>>> + req = next_request(&dep->cancelled_list); >>>> >>>> couldn't this be solved list_replace_init() instead? Then we can keep >>>> using the regular list_for_each_entry_safe() which has an added semantic >>>> meaning due to its name. >>>> >>> >>> Hi Felipe, >>> >>> Sorry for the late response. So I tried with a list_replace_init() to >>> within the list_for_each_entry_safe() loop to update tmp w/ the >>> cancelled_list list head, but the issue was still observed. This is >>> because we can't replace the reference the loop already has stored in >>> tmp, which is simply updated as the current item on the next iteration. >>> >>> I believe this is what you were trying to achieve? >>> >> Was wondering if you had any further inputs on this change? As >> mentioned, I tried a few things with list_replace_init(), which did not >> work. >> > > Sorry for the ping. Is this change OK to add as is? We've been running > into this instance pretty frequently during our testing, so just wanted > to close on the proper changes being merged upstream. The idea is this: struct list_head local; spin_lock_irq(&lock); list_replace_init(&dwc->cancelled_list, &local); spin_unlock_irq(&lock); list_for_each_entry_safe(req, tmp, &local, list) { /* ... */ } It looks to me this should work fine, no? You can also follow what drivers/usb/core/hcd.c is doing in usb_giveback_urb_bh() and restarting if dwc->cancelled_list is not empty after list_for_each_entry_safe(). Can you give that one a shot? -- balbi
Attachment:
signature.asc
Description: PGP signature