Hi Felipe, On 7/20/2021 11:30 PM, Felipe Balbi wrote: > > Hi Wesley, > > (first of all, sorry for the super long delay. This really fell through > the cracks) > No problem, I understand that you probably get a whole bunch of things all at once, so its hard to keep track of each one :). > Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes: >> Hi Felipe, >> >> On 6/9/2021 1:57 PM, Wesley Cheng wrote: >>> Hi Felipe, >>> >>> On 5/19/2021 1:52 AM, Wesley Cheng wrote: >>>> >>>> >>>> On 5/11/2021 1:13 AM, Felipe Balbi wrote: >>>>> >>>>> Hi, >>>>> >>>>> Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes: >>>>>> The list_for_each_entry_safe() macro saves the current item (n) and >>>>>> the item after (n+1), so that n can be safely removed without >>>>>> corrupting the list. However, when traversing the list and removing >>>>>> items using gadget giveback, the DWC3 lock is briefly released, >>>>>> allowing other routines to execute. There is a situation where, while >>>>>> items are being removed from the cancelled_list using >>>>>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable >>>>>> routine is running in parallel (due to UDC unbind). As the cleanup >>>>>> routine removes n, and the pullup disable removes n+1, once the >>>>>> cleanup retakes the DWC3 lock, it references a request who was already >>>>>> removed/handled. With list debug enabled, this leads to a panic. >>>>>> Ensure all instances of the macro are replaced where gadget giveback >>>>>> is used. >>>>>> >>>>>> Example call stack: >>>>>> >>>>>> Thread#1: >>>>>> __dwc3_gadget_ep_set_halt() - CLEAR HALT >>>>>> -> dwc3_gadget_ep_cleanup_cancelled_requests() >>>>>> ->list_for_each_entry_safe() >>>>>> ->dwc3_gadget_giveback(n) >>>>>> ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list] >>>>>> ->spin_unlock >>>>>> ->Thread#2 executes >>>>>> ... >>>>>> ->dwc3_gadget_giveback(n+1) >>>>>> ->Already removed! >>>>>> >>>>>> Thread#2: >>>>>> dwc3_gadget_pullup() >>>>>> ->waiting for dwc3 spin_lock >>>>>> ... >>>>>> ->Thread#1 released lock >>>>>> ->dwc3_stop_active_transfers() >>>>>> ->dwc3_remove_requests() >>>>>> ->fetches n+1 item from cancelled_list (n removed by Thread#1) >>>>>> ->dwc3_gadget_giveback() >>>>>> ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list] >>>>>> ->spin_unlock >>>>>> >>>>>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list") >>>>>> Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx> >>>>>> Reviewed-by: Peter Chen <peter.chen@xxxxxxxxxx> >>>>>> --- >>>>>> Changes in v2: >>>>>> - Updated commit message with context call stack of an example scenario >>>>>> seen on device. >>>>>> >>>>>> drivers/usb/dwc3/gadget.c | 8 ++++---- >>>>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>>>>> index dd80e5c..efa939b 100644 >>>>>> --- a/drivers/usb/dwc3/gadget.c >>>>>> +++ b/drivers/usb/dwc3/gadget.c >>>>>> @@ -1737,10 +1737,10 @@ static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *r >>>>>> static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep) >>>>>> { >>>>>> struct dwc3_request *req; >>>>>> - struct dwc3_request *tmp; >>>>>> struct dwc3 *dwc = dep->dwc; >>>>>> >>>>>> - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { >>>>>> + while (!list_empty(&dep->cancelled_list)) { >>>>>> + req = next_request(&dep->cancelled_list); >>>>> >>>>> couldn't this be solved list_replace_init() instead? Then we can keep >>>>> using the regular list_for_each_entry_safe() which has an added semantic >>>>> meaning due to its name. >>>>> >>>> >>>> Hi Felipe, >>>> >>>> Sorry for the late response. So I tried with a list_replace_init() to >>>> within the list_for_each_entry_safe() loop to update tmp w/ the >>>> cancelled_list list head, but the issue was still observed. This is >>>> because we can't replace the reference the loop already has stored in >>>> tmp, which is simply updated as the current item on the next iteration. >>>> >>>> I believe this is what you were trying to achieve? >>>> >>> Was wondering if you had any further inputs on this change? As >>> mentioned, I tried a few things with list_replace_init(), which did not >>> work. >>> >> >> Sorry for the ping. Is this change OK to add as is? We've been running >> into this instance pretty frequently during our testing, so just wanted >> to close on the proper changes being merged upstream. > > The idea is this: > > struct list_head local; > > spin_lock_irq(&lock); > list_replace_init(&dwc->cancelled_list, &local); > spin_unlock_irq(&lock); > > list_for_each_entry_safe(req, tmp, &local, list) { > /* ... */ > } > > It looks to me this should work fine, no? You can also follow what > drivers/usb/core/hcd.c is doing in usb_giveback_urb_bh() and restarting > if dwc->cancelled_list is not empty after list_for_each_entry_safe(). > > Can you give that one a shot? > Great, thanks for this suggestion! Now I understand what you were referring to. I gave this a try and it works well. Will prepare a change to replace both places with list_replace_init() Thanks Wesley Cheng -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project