Wesley Cheng wrote: > Hi Thinh, > > On 2/15/2022 9:14 AM, Thinh Nguyen wrote: >> Jung Daehwan wrote: >>> Hi Thinh, >>> >>> On Mon, Feb 14, 2022 at 06:44:33PM +0000, Thinh Nguyen wrote: >>>> Hi, >>>> >>>> Daehwan Jung wrote: >>>>> It always sets DWC3_EP_END_TRANSFER_PENDING in dwc3_stop_active_transfer >>>>> even if dwc3_send_gadget_ep_cmd fails. It can cause some problems like >>>> >>>> How does it fail? Timed out? >>> >>> Yes, timed out. >>>> >>>>> skipping clear stall commmand or giveback from dequeue. We fix to set it >>>>> only when ep cmd success. Additionally, We clear DWC3_EP_TRANSFER_STARTED >>>>> for next trb to start transfer not update transfer. >>>> >>>> We shouldn't do this. Things will be out of sync. It may work for 1 >>>> scenario, but it won't work for others. >>>> >>>> Please help me understand a few things: >>>> >>>> 1) What is the scenario that triggers this? Is it random? >>>> >>> ep cmd timeout occurs on dequeue request from user side. End Transfer command >>> would be sent in dwc3_stop_active transfer. >> >> At the high level, what's triggering the request dequeue? Is it from a >> disconnect, change of interface, or simply function driver protocol that >> changes it. >> >> What application was used to trigger this? >> > Sorry for jumping in here, but looks like Daehwan is running into a > similar issue I am seeing as well. > > At least in my scenario, the dequeue is coming from a function driver > which exposes a device to userspace. Once that device is closed, it > will issue a dequeue on all pending/submitted requests. Dequeuing request is coming from the function driver, but what causes the dequeue. For example, the End Transfer command due to a disconnect may give a different clues than a dequeue from a change of interface. > >>> >>>> 2) Are there other traffics pending while issuing the End Transfer >>>> command? If so, what transfer type(s)? >>>> >>> I haven't checked it yet. >> >> Can you check? >> > For the cases where we've collected a crash log, we can see that during > the END transfer timeouts there was always a pending EP0 transaction. > We had reached out to our internal HW folks to get some inputs on what > could be causing this kind of issue, and we were able to get some > recommendations from their Synopsis POCs. It's "Synopsys" :) > > It was mentioned that if there was an active EP0 transfer, an end > transfer command on a non-control EP can fail w/ timed out. > What controller version are you using? And what version is Jung using? Do you have the STAR number of the issue. If you're using a different version than Jung's, then it may not be the same issue. >>> >>>> 3) Have you tried increasing the timeout? >>>> >>> No, I haven't. >> >> Can you try up to 10 seconds (just for experiment) >> > I've tried this too, and it did not help. > >>>> BR, >>>> Thinh >>>> >>> >>> This issue occurs very rarely on customer. I only have restricted >>> information. That's why I've been trying to reproduce it. >> >> How did you test your fix if you can't reproduce it? >> >>> >>> Wesley may have run into same issue and you can see this issue in detail. >>> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=9d423b69-fc3fd32e-9d43b026-74fe485fff30-77a099b52659410d&q=1&e=20b4d9f5-2599-4f57-8b6a-7c4ec167d228&u=https*3A*2F*2Flore.kernel.org*2Flinux-usb*2F20220203080017.27339-1-quic_wcheng*40quicinc.com*2F__;JSUlJSUlJQ!!A4F2R9G_pg!JWPzNLoO3BFX_IZCVzmHPtxq6frr_VFbSNNaxSQylunt1Y4EauTOefth2LCIcVEuTx8E$ >>> >> >> I can take a look, but please provide the tracepoints of the failure if >> you can reproduce it. >> > Let me see if I have any previous traces I can share. If not, I have a > pretty reliable repro set up I can collect a trace for you. For now, I > will focus on just getting the endxfer timeout seen during ep dequeue. > As mentioned on my patchset, this can happen during device-initiated > disconnect as well. > Your patch set is still on my todo list. I haven't reviewed it. There's some concern looking at it from a first glance, I'll check it out more thoroughly later. Can you provide the tracepoints? Thanks, Thinh