On Wed, Jan 16, 2019 at 3:20 PM Vincent Pelletier <evgreen@xxxxxxxxxxxx> wrote: > > This bug happens only when the UDC needs to sleep during usb_ep_dequeue, > as is the case for (at least) dwc3. > > [ 382.200896] BUG: scheduling while atomic: screen/1808/0x00000100 > [ 382.207124] 4 locks held by screen/1808: > [ 382.211266] #0: (rcu_callback){....}, at: [<c10b4ff0>] rcu_process_callbacks+0x260/0x440 > [ 382.219949] #1: (rcu_read_lock_sched){....}, at: [<c1358ba0>] percpu_ref_switch_to_atomic_rcu+0xb0/0x130 > [ 382.230034] #2: (&(&ctx->ctx_lock)->rlock){....}, at: [<c11f0c73>] free_ioctx_users+0x23/0xd0 > [ 382.230096] #3: (&(&ffs->eps_lock)->rlock){....}, at: [<f81e7710>] ffs_aio_cancel+0x20/0x60 [usb_f_fs] > [ 382.230160] Modules linked in: usb_f_fs libcomposite configfs bnep btsdio bluetooth ecdh_generic brcmfmac brcmutil intel_powerclamp coretemp dwc3 kvm_intel ulpi udc_core kvm irqbypass crc32_pclmul crc32c_intel pcbc dwc3_pci aesni_intel aes_i586 crypto_simd cryptd ehci_pci ehci_hcd gpio_keys usbcore basincove_gpadc industrialio usb_common > [ 382.230407] CPU: 1 PID: 1808 Comm: screen Not tainted 4.14.0-edison+ #117 > [ 382.230416] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 > [ 382.230425] Call Trace: > [ 382.230438] <SOFTIRQ> > [ 382.230466] dump_stack+0x47/0x62 > [ 382.230498] __schedule_bug+0x61/0x80 > [ 382.230522] __schedule+0x43/0x7a0 > [ 382.230587] schedule+0x5f/0x70 > [ 382.230625] dwc3_gadget_ep_dequeue+0x14c/0x270 [dwc3] > [ 382.230669] ? do_wait_intr_irq+0x70/0x70 > [ 382.230724] usb_ep_dequeue+0x19/0x90 [udc_core] > [ 382.230770] ffs_aio_cancel+0x37/0x60 [usb_f_fs] > [ 382.230798] kiocb_cancel+0x31/0x40 > [ 382.230822] free_ioctx_users+0x4d/0xd0 > [ 382.230858] percpu_ref_switch_to_atomic_rcu+0x10a/0x130 > [ 382.230881] ? percpu_ref_exit+0x40/0x40 > [ 382.230904] rcu_process_callbacks+0x2b3/0x440 > [ 382.230965] __do_softirq+0xf8/0x26b > [ 382.231011] ? __softirqentry_text_start+0x8/0x8 > [ 382.231033] do_softirq_own_stack+0x22/0x30 > [ 382.231042] </SOFTIRQ> > [ 382.231071] irq_exit+0x45/0xc0 > [ 382.231089] smp_apic_timer_interrupt+0x13c/0x150 > [ 382.231118] apic_timer_interrupt+0x35/0x3c > [ 382.231132] EIP: __copy_user_ll+0xe2/0xf0 > [ 382.231142] EFLAGS: 00210293 CPU: 1 > [ 382.231154] EAX: bfd4508c EBX: 00000004 ECX: 00000003 EDX: f3d8fe50 > [ 382.231165] ESI: f3d8fe51 EDI: bfd4508d EBP: f3d8fe14 ESP: f3d8fe08 > [ 382.231176] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > [ 382.231265] core_sys_select+0x25f/0x320 > [ 382.231346] ? __wake_up_common_lock+0x62/0x80 > [ 382.231399] ? tty_ldisc_deref+0x13/0x20 > [ 382.231438] ? ldsem_up_read+0x1b/0x40 > [ 382.231459] ? tty_ldisc_deref+0x13/0x20 > [ 382.231479] ? tty_write+0x29f/0x2e0 > [ 382.231514] ? n_tty_ioctl+0xe0/0xe0 > [ 382.231541] ? tty_write_unlock+0x30/0x30 > [ 382.231566] ? __vfs_write+0x22/0x110 > [ 382.231604] ? security_file_permission+0x2f/0xd0 > [ 382.231635] ? rw_verify_area+0xac/0x120 > [ 382.231677] ? vfs_write+0x103/0x180 > [ 382.231711] SyS_select+0x87/0xc0 > [ 382.231739] ? SyS_write+0x42/0x90 > [ 382.231781] do_fast_syscall_32+0xd6/0x1a0 > [ 382.231836] entry_SYSENTER_32+0x47/0x71 > [ 382.231848] EIP: 0xb7f75b05 > [ 382.231857] EFLAGS: 00000246 CPU: 1 > [ 382.231868] EAX: ffffffda EBX: 00000400 ECX: bfd4508c EDX: bfd4510c > [ 382.231878] ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: bfd45020 > [ 382.231889] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b > [ 382.232281] softirq: huh, entered softirq 9 RCU c10b4d90 with preempt_count 00000100, exited with 00000000? > > Signed-off-by: Vincent Pelletier <plr.vincent@xxxxxxxxx> > Tested-by: Sam Protsenko <semen.protsenko@xxxxxxxxxx> > Signed-off-by: he, bo <bo.he@xxxxxxxxx> > --- > drivers/usb/gadget/function/f_fs.c | 26 ++++++++++++++++++-------- > 1 file changed, 18 insertions(+), 8 deletions(-) > Hi Vincent, We finally caught up to the apply and revert of this change, and are now experiencing the issue that this patch originally tried to fix. Is anybody still looking at this issue? Before I saw all this on the lists I was doing some thinking about how to make dwc3_gadget_ep_dequeue not sleep. Basically that would mean spinning without the sleep somehow. This seemed to get pretty tricky with what appears to be a queue-like nature for dwc3 interrupts (I am not at all familiar with dwc3). You'd have to go chase down where the interrupt could be, either in the hardware or in the software queue. But then I wondered about the original nature of needing to wait for the transfer completion in order to remove all the TRBs. Is this because we're worried that the hardware will be sitting on top of a TRB we're removing, so then we free and corrupt the next pointer, and then hardware follows it somewhere crazy? Does DWC3 have a register for seeing which TRB is currently being processed? If so, could we have a while loop near clearing the _HWO bit to make sure hardware is not looking at each TRB we are clearing out. Or maybe more simply, is there a way to stop the whole machine and then restart it in a graceful way? -Evan ps- Apologies for replying to the original message and not the end of the thread. I had to bounce the message into my inbox, and couldn't figure out how to have Patchwork give me the full thread.