On Fri, 24 Nov 2017 14:04:35 +0000, Vincent Pelletier <plr.vincent@xxxxxxxxx> wrote: > What seem to be the relevant pieces are: > - at least one AIO transfers submitted for reading from EP2OUT > - upon receiving data from stdin, a synchronous write happens on EP2IN, > which blocks if host did not submit a transfer (normal) > - SIGQUIT to interrupt the write while it's blocking I could simplify the issue requirements: it is not required to use synchronous IO. Submitting AIO and killing functionfs-interfaced process is sufficient. This time, I kill with signal 15, which makes the process exit without (in my understanding) returning control to python (no sighandler set). For some reason, I could now get kgdb to start with modified module loaded, and inspect the status a bit more. To my surprise, the error symptom do not seem to change: (gdb) lx-dmesg [...] [ 382.505547] BUG: scheduling while atomic: swapper/1/0/0x00000100 [ 382.511767] 3 locks held by swapper/1/0: [ 382.515903] #0: (rcu_callback){....}, at: [<c10b4ff0>] rcu_process_callbacks+0x260/0x440 [ 382.524572] #1: (rcu_read_lock_sched){....}, at: [<c1358ba0>] percpu_ref_switch_to_atomic_rcu+0xb0/0x130 [ 382.534650] #2: (&(&ctx->ctx_lock)->rlock){....}, at: [<c11f0c73>] free_ioctx_users+0x23/0xd0 [ 382.534699] Modules linked in: usb_f_fs libcomposite configfs bnep btsdio bluetooth ecdh_generic brcmfmac brcmutil dwc3 ulpi intel_powerclamp udc_core coretemp kvm_intel kvm irqbypass crc32_pclmul crc32c_intel pcbc dwc3_pci aesni_intel aes_i586 crypto_simd cryptd ehci_pci ehci_hcd basincove_gpadc gpio_keys industrialio usbcore usb_common [ 382.534945] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.0-edison+ #117 [ 382.534955] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 [ 382.534963] Call Trace: [ 382.534976] <SOFTIRQ> [ 382.535004] dump_stack+0x47/0x62 [ 382.535034] __schedule_bug+0x61/0x80 [ 382.535057] __schedule+0x43/0x7a0 [ 382.535116] schedule+0x5f/0x70 [ 382.535155] dwc3_gadget_ep_dequeue+0x14c/0x270 [dwc3] [ 382.535196] ? do_wait_intr_irq+0x70/0x70 [ 382.535248] usb_ep_dequeue+0x19/0x90 [udc_core] [ 382.535292] ffs_aio_cancel+0x22/0x30 [usb_f_fs] [ 382.535312] kiocb_cancel+0x31/0x40 [ 382.535333] free_ioctx_users+0x4d/0xd0 [ 382.535367] percpu_ref_switch_to_atomic_rcu+0x10a/0x130 [ 382.535389] ? percpu_ref_exit+0x40/0x40 [ 382.535410] rcu_process_callbacks+0x2b3/0x440 [ 382.535460] ? __softirqentry_text_start+0x8/0x8 [ 382.535476] __do_softirq+0xf8/0x26b [ 382.535517] ? __softirqentry_text_start+0x8/0x8 [ 382.535538] do_softirq_own_stack+0x22/0x30 [ 382.535547] </SOFTIRQ> [ 382.535573] irq_exit+0x45/0xc0 [ 382.535591] smp_apic_timer_interrupt+0x13c/0x150 [ 382.535618] apic_timer_interrupt+0x35/0x3c [ 382.535637] EIP: cpuidle_enter_state+0x26b/0x330 [ 382.535647] EFLAGS: 00200202 CPU: 1 [ 382.535657] EAX: 00000000 EBX: c1bfd680 ECX: 00000002 EDX: 00000000 [ 382.535668] ESI: 00016e40 EDI: 00000059 EBP: f6911f60 ESP: f6911f2c [ 382.535679] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 382.535792] cpuidle_enter+0x14/0x20 [ 382.535815] call_cpuidle+0x35/0x40 [ 382.535836] do_idle+0x114/0x180 [ 382.535862] cpu_startup_entry+0x25/0x30 [ 382.535884] start_secondary+0x191/0x1a0 [ 382.535915] startup_32_smp+0x164/0x166 [ 382.536327] BUG: unable to handle kernel NULL pointer dereference at (null) [ 382.536351] IP: 0xf6909f43 [ 382.536357] *pde = 00000000 [ 382.536372] Oops: 0000 [#1] SMP [ 393.811964] WARNING: stack recursion on stack type 3 I tried to confirm I'm running the modified kernel: (gdb) disassemble ffs_aio_cancel Dump of assembler code for function ffs_aio_cancel: 0xf819c6f0 <+0>: lea %ds:0x0(%esi,%eiz,1),%esi 0xf819c6f5 <+5>: mov 0x10(%eax),%edx 0xf819c6f8 <+8>: test %edx,%edx 0xf819c6fa <+10>: je 0xf819c718 <ffs_aio_cancel+40> 0xf819c6fc <+12>: mov 0x4c(%edx),%eax 0xf819c6ff <+15>: test %eax,%eax 0xf819c701 <+17>: je 0xf819c718 <ffs_aio_cancel+40> 0xf819c703 <+19>: mov 0x50(%edx),%edx 0xf819c706 <+22>: test %edx,%edx 0xf819c708 <+24>: je 0xf819c718 <ffs_aio_cancel+40> 0xf819c70a <+26>: push %ebp 0xf819c70b <+27>: mov %esp,%ebp 0xf819c70d <+29>: call 0xf80ba220 <usb_ep_dequeue> 0xf819c712 <+34>: pop %ebp 0xf819c713 <+35>: ret 0xf819c714 <+36>: lea 0x0(%esi,%eiz,1),%esi 0xf819c718 <+40>: mov $0xffffffea,%eax 0xf819c71d <+45>: ret End of assembler dump. Although I do not speak x86 fluently, I would expect several more instructions or calls if IRQs were being disabled, so I believe I am running the modified version. I was surprised to see a high-pid process about dwc3: (gdb) info threads [...] 105 Thread 1852 (htop) 0x00000000 in ?? () <- from where I killed python (signal 15) 106 Thread 1879 (strace) 0x00000000 in ?? () <- strace python [...] 107 Thread 1881 (python) 0x00000000 in ?? () <- python, interfacing with functionfs 108 Thread 1882 (irq/34-dwc3) 0x00000000 in ?? () So I grabbed the backtraces of both the dwc3 thread and python: (gdb) t 108 [Switching to thread 108 (Thread 1882)] #0 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () #1 0xf41cd340 in ?? () #2 0xc18474af in schedule () at kernel/sched/core.c:3425 #3 0xc10a99c5 in irq_wait_for_interrupt (action=<optimized out>) at kernel/irq/manage.c:768 #4 irq_thread (data=0x0) at kernel/irq/manage.c:972 #5 0xc106d5e1 in kthread (_create=0xf4f52c00) at kernel/kthread.c:231 #6 0xc184cc41 in ret_from_fork () at arch/x86/entry/entry_32.S:293 Normal IRQ handler, I guess. Maybe it only has a higher pid because gadget became necessarily active after python script opened endpoints, and handler would only be forked when it becomes needed ? (gdb) t 107 [Switching to thread 107 (Thread 1881)] #0 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () #1 0xf568d600 in ?? () #2 0xc18474af in schedule () at kernel/sched/core.c:3425 #3 0xc184b5ca in schedule_timeout (timeout=<optimized out>) at kernel/time/timer.c:1720 #4 0xc184848b in do_wait_for_common (state=<optimized out>, timeout=2147483647, action=<optimized out>, x=<optimized out>) at kernel/sched/completion.c:91 #5 __wait_for_common (state=<optimized out>, timeout=<optimized out>, action=<optimized out>, x=<optimized out>) at kernel/sched/completion.c:112 #6 wait_for_common (state=<optimized out>, timeout=<optimized out>, x=<optimized out>) at kernel/sched/completion.c:123 #7 wait_for_completion (x=0xf4e25df8) at kernel/sched/completion.c:144 #8 0xc11f0e8a in exit_aio (mm=0xf568d600) at fs/aio.c:903 #9 0xc104e8a8 in __mmput (mm=<optimized out>) at kernel/fork.c:925 #10 0xc1055da2 in exit_mm () at kernel/exit.c:544 #11 do_exit (code=<optimized out>) at kernel/exit.c:852 #12 0xc10564c7 in do_group_exit (exit_code=15) at kernel/exit.c:968 #13 0xc1060463 in get_signal (ksig=<optimized out>) at kernel/signal.c:2334 #14 0xc101888e in do_signal (regs=0xf4e25fb4) at arch/x86/kernel/signal.c:809 #15 0xc1000fd3 in exit_to_usermode_loop (regs=0xf4e25fb4, cached_flags=7) at arch/x86/entry/common.c:158 #16 0xc10018c1 in prepare_exit_to_usermode (regs=<optimized out>) at arch/x86/entry/common.c:197 #17 syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:266 #18 do_syscall_32_irqs_on (regs=<optimized out>) at arch/x86/entry/common.c:335 #19 do_fast_syscall_32 (regs=0xf4e25fb4) at arch/x86/entry/common.c:391 #20 0xc184ccb3 in entry_SYSENTER_32 () at arch/x86/entry/entry_32.S:444 #21 0x00000008 in ?? () #22 0x01cad558 in ?? () #23 0x000003ff in tlv_data_is_valid (hdev=<optimized out>, adv_flags=4294967295, data=0x0, len=<error reading variable: Cannot access memory at address 0xbf80543c>, is_adv_data=8) at net/bluetooth/mgmt.c:6092 Backtrace stopped: previous frame inner to this frame (corrupt stack?) I'll try to continue debugging tomorrow, but pointers are very welcome. Regards, -- Vincent Pelletier -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html