Re: [PATCH v2 1/2] usb: dwc3: Not set DWC3_EP_END_TRANSFER_PENDING in ep cmd fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jung,

Jung Daehwan wrote:
> On Tue, Feb 15, 2022 at 05:14:50PM +0000, Thinh Nguyen wrote:
>> Jung Daehwan wrote:
>>> Hi Thinh,
>>>
>>> On Mon, Feb 14, 2022 at 06:44:33PM +0000, Thinh Nguyen wrote:
>>>> Hi,
>>>>
>>>> Daehwan Jung wrote:
>>>>> It always sets DWC3_EP_END_TRANSFER_PENDING in dwc3_stop_active_transfer
>>>>> even if dwc3_send_gadget_ep_cmd fails. It can cause some problems like
>>>>
>>>> How does it fail? Timed out?
>>>
>>> Yes, timed out.
>>>>
>>>>> skipping clear stall commmand or giveback from dequeue. We fix to set it
>>>>> only when ep cmd success. Additionally, We clear DWC3_EP_TRANSFER_STARTED
>>>>> for next trb to start transfer not update transfer.
>>>>
>>>> We shouldn't do this. Things will be out of sync. It may work for 1
>>>> scenario, but it won't work for others.
>>>>
>>>> Please help me understand a few things:
>>>>
>>>> 1) What is the scenario that triggers this? Is it random?
>>>>
>>> ep cmd timeout occurs on dequeue request from user side. End Transfer command
>>> would be sent in dwc3_stop_active transfer.
>>
>> At the high level, what's triggering the request dequeue? Is it from a
>> disconnect, change of interface, or simply function driver protocol that
>> changes it.
>>
>> What application was used to trigger this?
>>
> Removing function driver requests dequeue.

Removing function driver? Did it go through function unbind?

>>>
>>>> 2) Are there other traffics pending while issuing the End Transfer
>>>> command? If so, what transfer type(s)?
>>>>
>>> I haven't checked it yet. 
>>
>> Can you check?
>>
> It seems there's no pending traffic.

If this is the case, then it's different than the scenario Wesley's
described.

>>>
>>>> 3) Have you tried increasing the timeout?
>>>>
>>> No, I haven't.
>>
>> Can you try up to 10 seconds (just for experiment)
>>
> Yes, it's reproduced on 10 secs.
>>>> BR,
>>>> Thinh
>>>>
>>>
>>> This issue occurs very rarely on customer. I only have restricted
>>> information. That's why I've been trying to reproduce it.
>>
>> How did you test your fix if you can't reproduce it?
>>
> 
> I sent patches to my client and he said this issue would not occur after
> applying them. In fact, They are not totally equal with those we are
> discussing. I've refactor-ed and pushed them here.
> 
>>>
>>> Wesley may have run into same issue and you can see this issue in detail.
>>> https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=39bd3e50-66260725-39bcb51f-0cc47a31ce4e-bf000d5a697ead66&q=1&e=b04500ad-43d0-4ee7-b5f0-a7283d8fc0da&u=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fprotect2.fireeye.com*2Fv1*2Furl*3Fk*3D9d423b69-fc3fd32e-9d43b026-74fe485fff30-77a099b52659410d*26q*3D1*26e*3D20b4d9f5-2599-4f57-8b6a-7c4ec167d228*26u*3Dhttps*2A3A*2A2F*2A2Flore.kernel.org*2A2Flinux-usb*2A2F20220203080017.27339-1-quic_wcheng*2A40quicinc.com*2A2F__*3BJSUlJSUlJQ*21*21A4F2R9G_pg*21JWPzNLoO3BFX_IZCVzmHPtxq6frr_VFbSNNaxSQylunt1Y4EauTOefth2LCIcVEuTx8E*24__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!A4F2R9G_pg!LWmQ6CLKb5iIfxy33uvPDJ8HDhcXoveyl0bcBxj_tzZJN1gem7JWFAcD_r8v6riBYkH1$  
>>>
>>
>> I can take a look, but please provide the tracepoints of the failure if
>> you can reproduce it.
>>
> 
> Thanks to client's help, I could reproduce it in my envrionment.
> 
> 1. Trace
> 
> MtpServer-2484    [003] d..2   114.376829: dwc3_ep_queue: ep1out: req 0000000055561867 length 0/512 zsI ==> -115
> MtpServer-2484    [003] d..2   114.376838: dwc3_prepare_trb: ep1out: trb 0000000074078a95 (E23:D22) buf 0000000954845600 size 512 ctrl 0000001d (HlCS:sc:normal)
> MtpServer-2484    [003] d..2   114.376844: dwc3_prepare_trb: ep1out: trb 0000000095f503e7 (E24:D22) buf 00000000e1303000 size 512 ctrl 00000819 (HlcS:sC:normal)
> MtpServer-2484    [003] d..2   114.376867: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful

Is there any activity here? Seems like there's 70+ seconds of inactivity?

> ksoftirqd/2-23      [002] d.s2   190.428752: dwc3_ep_dequeue: ep2out: req 00000000732c0f69 length 0/16384 zsI ==> -115
> ksoftirqd/2-23      [002] dns3   190.447625: dwc3_gadget_ep_cmd: ep2out: cmd 'End Transfer' [40d08] params 00000000 00000000 00000000 --> status: Timed Out
> ksoftirqd/2-23      [002] dns2   190.450596: dwc3_ep_dequeue: ep2out: req 00000000be3434d4 length 0/16384 zsI ==> -115
> ksoftirqd/2-23      [002] dns2   190.450618: dwc3_ep_dequeue: ep2out: req 0000000033655f2e length 0/16384 zsI ==> -115
> ksoftirqd/2-23      [002] dns2   190.450625: dwc3_ep_dequeue: ep2out: req 00000000dc0ff635 length 0/16384 zsI ==> -115
> ksoftirqd/2-23      [002] dns2   190.450633: dwc3_ep_dequeue: ep2out: req 0000000023ffb8e8 length 0/16384 zsI ==> -115
> ksoftirqd/2-23      [002] dns2   190.450641: dwc3_ep_dequeue: ep2out: req 000000005fab61aa length 0/16384 zsI ==> -115
> ksoftirqd/2-23      [002] dns2   190.450648: dwc3_ep_dequeue: ep2out: req 000000001f618cd8 length 0/16384 zsI ==> -115
> ksoftirqd/2-23      [002] dns2   190.450654: dwc3_ep_dequeue: ep2out: req 00000000da0ea1b1 length 0/16384 zsI ==> -115
> ##### CPU 6 buffer started ####

Some traces were lost?

> HwBinder:636_1-658     [006] d..1   190.735324: dwc3_gadget_ep_disable: ep0out: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:>
> HwBinder:636_1-658     [006] d..1   190.735327: dwc3_gadget_ep_disable: ep0in: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:<
> HwBinder:636_1-658     [003] dn.1   214.206652: dwc3_gadget_ep_disable: ep0out: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:>
> HwBinder:636_1-658     [003] dn.1   214.206677: dwc3_gadget_ep_disable: ep0in: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:<
> 
> 2. Kernel log
> 
> [  190.447711]I[2:    ksoftirqd/2:   23] ------------[ cut here ]------------
> [  190.447777]I[2:    ksoftirqd/2:   23] WARNING: CPU: 2 PID: 23 at drivers/usb/dwc3/gadget.c:3632 dwc3_gadget_ep_dequeue+0x228/0x234
> [  190.449587]I[2:    ksoftirqd/2:   23] pc : dwc3_gadget_ep_dequeue+0x228/0x234
> [  190.449610]I[2:    ksoftirqd/2:   23] lr : dwc3_gadget_ep_dequeue+0x124/0x234
> [  190.449629]I[2:    ksoftirqd/2:   23] sp : ffffffc01294bad0
> [  190.449647]I[2:    ksoftirqd/2:   23] x29: ffffffc01294baf0 x28: 0000000000000000 
> [  190.449679]I[2:    ksoftirqd/2:   23] x27: ffffffc0121e4000 x26: ffffff88208c9dc0 
> [  190.449711]I[2:    ksoftirqd/2:   23] x25: ffffff893dd5c570 x24: ffffff893dd22080 
> [  190.449742]I[2:    ksoftirqd/2:   23] x23: ffffff88a4f8a168 x22: ffffff88a4f8a100 
> [  190.449773]I[2:    ksoftirqd/2:   23] x21: ffffff893dd5c500 x20: 0000000000000080 
> [  190.449803]I[2:    ksoftirqd/2:   23] x19: ffffff893dd22238 x18: ffffffc01291d068 
> [  190.449834]I[2:    ksoftirqd/2:   23] x17: 0000000000004000 x16: ffffff88bb0cc000 
> [  190.449864]I[2:    ksoftirqd/2:   23] x15: 0000000000000200 x14: ffffff88208d4d00 
> [  190.449895]I[2:    ksoftirqd/2:   23] x13: ffffff88208d4d00 x12: 00000000000001b5 
> [  190.449926]I[2:    ksoftirqd/2:   23] x11: ffffff88208c9dc0 x10: 0000000100000103 
> [  190.449958]I[2:    ksoftirqd/2:   23] x9 : ffffff88208c9dc0 x8 : 0000000000000008 
> [  190.449989]I[2:    ksoftirqd/2:   23] x7 : fefefefefefefefe x6 : 0000000000008080 
> [  190.450020]I[2:    ksoftirqd/2:   23] x5 : 0000000000000000 x4 : 0000000000000103 
> [  190.450050]I[2:    ksoftirqd/2:   23] x3 : 0000000000000080 x2 : ffffff88208ac900 
> [  190.450081]I[2:    ksoftirqd/2:   23] x1 : ffffff88208ac900 x0 : 00000000ffffff92 
> [  190.450120]I[2:    ksoftirqd/2:   23] Call trace:
> [  190.450153]I[2:    ksoftirqd/2:   23]  dwc3_gadget_ep_dequeue+0x228/0x234
> [  190.450187]I[2:    ksoftirqd/2:   23]  usb_ep_dequeue+0x4c/0xa0
> [  190.450219]I[2:    ksoftirqd/2:   23]  ffs_aio_cancel+0x4c/0x84
> [  190.450247]I[2:    ksoftirqd/2:   23]  free_ioctx_users+0x80/0x110
> [  190.450281]I[2:    ksoftirqd/2:   23]  percpu_ref_put_many+0xf0/0x164
> [  190.450310]I[2:    ksoftirqd/2:   23]  percpu_ref_switch_to_atomic_rcu+0x13c/0x1dc
> [  190.450339]I[2:    ksoftirqd/2:   23]  rcu_do_batch+0x208/0x5b4
> [  190.450360]I[2:    ksoftirqd/2:   23]  rcu_core+0x22c/0x43c
> [  190.450384]I[2:    ksoftirqd/2:   23]  rcu_core_si+0x14/0x24
> [  190.450409]I[2:    ksoftirqd/2:   23]  efi_header_end+0x134/0x350
> [  190.450437]I[2:    ksoftirqd/2:   23]  run_ksoftirqd+0x34/0x74
> 
> The point is there would be some requests pending without giveback-ed.
> They're from removing function driver(ffs_aio_cancel). It could cause
> the function driver would wait for dequeue requests completed.
> 

I may misunderstand the description, or something doesn't match up. If
"removing function driver" means function unbind, then this will trigger
disabling the endpoints and dwc3_remove_requests(). If
dwc3_remove_requests() was called, then the dwc3 driver would give back
the requests regardless whether the End Transfer command was completed.

If this is not the full log, can you send the full driver log. That
would give more clues.

Thanks,
Thinh




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux