On Tue, Feb 15, 2022 at 05:14:50PM +0000, Thinh Nguyen wrote: > Jung Daehwan wrote: > > Hi Thinh, > > > > On Mon, Feb 14, 2022 at 06:44:33PM +0000, Thinh Nguyen wrote: > >> Hi, > >> > >> Daehwan Jung wrote: > >>> It always sets DWC3_EP_END_TRANSFER_PENDING in dwc3_stop_active_transfer > >>> even if dwc3_send_gadget_ep_cmd fails. It can cause some problems like > >> > >> How does it fail? Timed out? > > > > Yes, timed out. > >> > >>> skipping clear stall commmand or giveback from dequeue. We fix to set it > >>> only when ep cmd success. Additionally, We clear DWC3_EP_TRANSFER_STARTED > >>> for next trb to start transfer not update transfer. > >> > >> We shouldn't do this. Things will be out of sync. It may work for 1 > >> scenario, but it won't work for others. > >> > >> Please help me understand a few things: > >> > >> 1) What is the scenario that triggers this? Is it random? > >> > > ep cmd timeout occurs on dequeue request from user side. End Transfer command > > would be sent in dwc3_stop_active transfer. > > At the high level, what's triggering the request dequeue? Is it from a > disconnect, change of interface, or simply function driver protocol that > changes it. > > What application was used to trigger this? > Removing function driver requests dequeue. > > > >> 2) Are there other traffics pending while issuing the End Transfer > >> command? If so, what transfer type(s)? > >> > > I haven't checked it yet. > > Can you check? > It seems there's no pending traffic. > > > >> 3) Have you tried increasing the timeout? > >> > > No, I haven't. > > Can you try up to 10 seconds (just for experiment) > Yes, it's reproduced on 10 secs. > >> BR, > >> Thinh > >> > > > > This issue occurs very rarely on customer. I only have restricted > > information. That's why I've been trying to reproduce it. > > How did you test your fix if you can't reproduce it? > I sent patches to my client and he said this issue would not occur after applying them. In fact, They are not totally equal with those we are discussing. I've refactor-ed and pushed them here. > > > > Wesley may have run into same issue and you can see this issue in detail. > > https://protect2.fireeye.com/v1/url?k=39bd3e50-66260725-39bcb51f-0cc47a31ce4e-bf000d5a697ead66&q=1&e=b04500ad-43d0-4ee7-b5f0-a7283d8fc0da&u=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D9d423b69-fc3fd32e-9d43b026-74fe485fff30-77a099b52659410d%26q%3D1%26e%3D20b4d9f5-2599-4f57-8b6a-7c4ec167d228%26u%3Dhttps%2A3A%2A2F%2A2Flore.kernel.org%2A2Flinux-usb%2A2F20220203080017.27339-1-quic_wcheng%2A40quicinc.com%2A2F__%3BJSUlJSUlJQ%21%21A4F2R9G_pg%21JWPzNLoO3BFX_IZCVzmHPtxq6frr_VFbSNNaxSQylunt1Y4EauTOefth2LCIcVEuTx8E%24 > > > > I can take a look, but please provide the tracepoints of the failure if > you can reproduce it. > Thanks to client's help, I could reproduce it in my envrionment. 1. Trace MtpServer-2484 [003] d..2 114.376829: dwc3_ep_queue: ep1out: req 0000000055561867 length 0/512 zsI ==> -115 MtpServer-2484 [003] d..2 114.376838: dwc3_prepare_trb: ep1out: trb 0000000074078a95 (E23:D22) buf 0000000954845600 size 512 ctrl 0000001d (HlCS:sc:normal) MtpServer-2484 [003] d..2 114.376844: dwc3_prepare_trb: ep1out: trb 0000000095f503e7 (E24:D22) buf 00000000e1303000 size 512 ctrl 00000819 (HlcS:sC:normal) MtpServer-2484 [003] d..2 114.376867: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful ksoftirqd/2-23 [002] d.s2 190.428752: dwc3_ep_dequeue: ep2out: req 00000000732c0f69 length 0/16384 zsI ==> -115 ksoftirqd/2-23 [002] dns3 190.447625: dwc3_gadget_ep_cmd: ep2out: cmd 'End Transfer' [40d08] params 00000000 00000000 00000000 --> status: Timed Out ksoftirqd/2-23 [002] dns2 190.450596: dwc3_ep_dequeue: ep2out: req 00000000be3434d4 length 0/16384 zsI ==> -115 ksoftirqd/2-23 [002] dns2 190.450618: dwc3_ep_dequeue: ep2out: req 0000000033655f2e length 0/16384 zsI ==> -115 ksoftirqd/2-23 [002] dns2 190.450625: dwc3_ep_dequeue: ep2out: req 00000000dc0ff635 length 0/16384 zsI ==> -115 ksoftirqd/2-23 [002] dns2 190.450633: dwc3_ep_dequeue: ep2out: req 0000000023ffb8e8 length 0/16384 zsI ==> -115 ksoftirqd/2-23 [002] dns2 190.450641: dwc3_ep_dequeue: ep2out: req 000000005fab61aa length 0/16384 zsI ==> -115 ksoftirqd/2-23 [002] dns2 190.450648: dwc3_ep_dequeue: ep2out: req 000000001f618cd8 length 0/16384 zsI ==> -115 ksoftirqd/2-23 [002] dns2 190.450654: dwc3_ep_dequeue: ep2out: req 00000000da0ea1b1 length 0/16384 zsI ==> -115 ##### CPU 6 buffer started #### HwBinder:636_1-658 [006] d..1 190.735324: dwc3_gadget_ep_disable: ep0out: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:> HwBinder:636_1-658 [006] d..1 190.735327: dwc3_gadget_ep_disable: ep0in: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:< HwBinder:636_1-658 [003] dn.1 214.206652: dwc3_gadget_ep_disable: ep0out: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:> HwBinder:636_1-658 [003] dn.1 214.206677: dwc3_gadget_ep_disable: ep0in: mps 512/512 streams 0 burst 1 ring 0/0 flags e:swbp:< 2. Kernel log [ 190.447711]I[2: ksoftirqd/2: 23] ------------[ cut here ]------------ [ 190.447777]I[2: ksoftirqd/2: 23] WARNING: CPU: 2 PID: 23 at drivers/usb/dwc3/gadget.c:3632 dwc3_gadget_ep_dequeue+0x228/0x234 [ 190.449587]I[2: ksoftirqd/2: 23] pc : dwc3_gadget_ep_dequeue+0x228/0x234 [ 190.449610]I[2: ksoftirqd/2: 23] lr : dwc3_gadget_ep_dequeue+0x124/0x234 [ 190.449629]I[2: ksoftirqd/2: 23] sp : ffffffc01294bad0 [ 190.449647]I[2: ksoftirqd/2: 23] x29: ffffffc01294baf0 x28: 0000000000000000 [ 190.449679]I[2: ksoftirqd/2: 23] x27: ffffffc0121e4000 x26: ffffff88208c9dc0 [ 190.449711]I[2: ksoftirqd/2: 23] x25: ffffff893dd5c570 x24: ffffff893dd22080 [ 190.449742]I[2: ksoftirqd/2: 23] x23: ffffff88a4f8a168 x22: ffffff88a4f8a100 [ 190.449773]I[2: ksoftirqd/2: 23] x21: ffffff893dd5c500 x20: 0000000000000080 [ 190.449803]I[2: ksoftirqd/2: 23] x19: ffffff893dd22238 x18: ffffffc01291d068 [ 190.449834]I[2: ksoftirqd/2: 23] x17: 0000000000004000 x16: ffffff88bb0cc000 [ 190.449864]I[2: ksoftirqd/2: 23] x15: 0000000000000200 x14: ffffff88208d4d00 [ 190.449895]I[2: ksoftirqd/2: 23] x13: ffffff88208d4d00 x12: 00000000000001b5 [ 190.449926]I[2: ksoftirqd/2: 23] x11: ffffff88208c9dc0 x10: 0000000100000103 [ 190.449958]I[2: ksoftirqd/2: 23] x9 : ffffff88208c9dc0 x8 : 0000000000000008 [ 190.449989]I[2: ksoftirqd/2: 23] x7 : fefefefefefefefe x6 : 0000000000008080 [ 190.450020]I[2: ksoftirqd/2: 23] x5 : 0000000000000000 x4 : 0000000000000103 [ 190.450050]I[2: ksoftirqd/2: 23] x3 : 0000000000000080 x2 : ffffff88208ac900 [ 190.450081]I[2: ksoftirqd/2: 23] x1 : ffffff88208ac900 x0 : 00000000ffffff92 [ 190.450120]I[2: ksoftirqd/2: 23] Call trace: [ 190.450153]I[2: ksoftirqd/2: 23] dwc3_gadget_ep_dequeue+0x228/0x234 [ 190.450187]I[2: ksoftirqd/2: 23] usb_ep_dequeue+0x4c/0xa0 [ 190.450219]I[2: ksoftirqd/2: 23] ffs_aio_cancel+0x4c/0x84 [ 190.450247]I[2: ksoftirqd/2: 23] free_ioctx_users+0x80/0x110 [ 190.450281]I[2: ksoftirqd/2: 23] percpu_ref_put_many+0xf0/0x164 [ 190.450310]I[2: ksoftirqd/2: 23] percpu_ref_switch_to_atomic_rcu+0x13c/0x1dc [ 190.450339]I[2: ksoftirqd/2: 23] rcu_do_batch+0x208/0x5b4 [ 190.450360]I[2: ksoftirqd/2: 23] rcu_core+0x22c/0x43c [ 190.450384]I[2: ksoftirqd/2: 23] rcu_core_si+0x14/0x24 [ 190.450409]I[2: ksoftirqd/2: 23] efi_header_end+0x134/0x350 [ 190.450437]I[2: ksoftirqd/2: 23] run_ksoftirqd+0x34/0x74 The point is there would be some requests pending without giveback-ed. They're from removing function driver(ffs_aio_cancel). It could cause the function driver would wait for dequeue requests completed. Best Regards, Jung Daehwan > Thanks, > Thinh