While running reboot tests w/ a specific set of USB devices (and slub_debug enabled), I found that once every few hours my device would be crashed with a stack that looked like this: [ 14.012445] BUG: spinlock bad magic on CPU#0, modprobe/2091 [ 14.012460] lock: 0xffffffc0cb055978, .magic: ffffffc0, .owner: cryption contexts: %lu/%lu [ 14.012460] /1025536097, .owner_cpu: 0 [ 14.012466] CPU: 0 PID: 2091 Comm: modprobe Not tainted 4.4.79 #352 [ 14.012468] Hardware name: Google Kevin (DT) [ 14.012471] Call trace: [ 14.012483] [<....>] dump_backtrace+0x0/0x160 [ 14.012487] [<....>] show_stack+0x20/0x28 [ 14.012494] [<....>] dump_stack+0xb4/0xf0 [ 14.012500] [<....>] spin_dump+0x8c/0x98 [ 14.012504] [<....>] spin_bug+0x30/0x3c [ 14.012508] [<....>] do_raw_spin_lock+0x40/0x164 [ 14.012515] [<....>] _raw_spin_lock_irqsave+0x64/0x74 [ 14.012521] [<....>] __wake_up+0x2c/0x60 [ 14.012528] [<....>] async_completed+0x2d0/0x300 [ 14.012534] [<....>] __usb_hcd_giveback_urb+0xc4/0x138 [ 14.012538] [<....>] usb_hcd_giveback_urb+0x54/0xf0 [ 14.012544] [<....>] xhci_irq+0x1314/0x1348 [ 14.012548] [<....>] usb_hcd_irq+0x40/0x50 [ 14.012553] [<....>] handle_irq_event_percpu+0x1b4/0x3f0 [ 14.012556] [<....>] handle_irq_event+0x4c/0x7c [ 14.012561] [<....>] handle_fasteoi_irq+0x158/0x1c8 [ 14.012564] [<....>] generic_handle_irq+0x30/0x44 [ 14.012568] [<....>] __handle_domain_irq+0x90/0xbc [ 14.012572] [<....>] gic_handle_irq+0xcc/0x18c Investigation using kgdb() found that the wait queue that was passed into wake_up() had been freed (it was filled with slub_debug poison). I analyzed and instrumented the code and reproduced. My current belief is that this is happening: 1. async_completed() is called (from IRQ). Moves "as" onto the completed list. 2. On another CPU, proc_reapurbnonblock_compat() calls async_getcompleted(). Blocks on spinlock. 3. async_completed() releases the lock; keeps running; gets blocked midway through wake_up(). 4. proc_reapurbnonblock_compat() => async_getcompleted() gets the lock; removes "as" from completed list and frees it. 5. usbdev_release() is called. Frees "ps". 6. async_completed() finally continues running wake_up(). ...but wake_up() has a pointer to the freed "ps". The instrumentation that led me to believe this was based on adding some trace_printk() calls in a select few functions and then using kdb's "ftdump" at crash time. The trace follows (NOTE: in the trace below I cheated a little bit and added a udelay(1000) in async_completed() after releasing the spinlock because I wanted it to trigger quicker): <...>-2104 0d.h2 13759034us!: async_completed at start: as=ffffffc0cc638200 mtpd-2055 3.... 13759356us : async_getcompleted before spin_lock_irqsave mtpd-2055 3d..1 13759362us : async_getcompleted after list_del_init: as=ffffffc0cc638200 mtpd-2055 3.... 13759371us+: proc_reapurbnonblock_compat: free_async(ffffffc0cc638200) mtpd-2055 3.... 13759422us+: async_getcompleted before spin_lock_irqsave mtpd-2055 3.... 13759479us : usbdev_release at start: ps=ffffffc0cc042080 mtpd-2055 3.... 13759487us : async_getcompleted before spin_lock_irqsave mtpd-2055 3.... 13759497us!: usbdev_release after kfree(ps): ps=ffffffc0cc042080 <...>-2104 0d.h2 13760294us : async_completed before wake_up(): as=ffffffc0cc638200 To fix this problem we can just move the wake_up() under the ps->lock. There should be no issues there that I'm aware of. Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx> --- NOTE: All testing of this patch was done on the Chromium OS kernel version 4.4. The machine tested only barely boots upstream Linux and it seemed unlikely I'd be able to reproduce the exact failure there, so I didn't try. The issue seems to clearly apply to upstream based on code analysis, though. drivers/usb/core/devio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index ebe27595c4af..0ff0feddfd1f 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -623,6 +623,8 @@ static void async_completed(struct urb *urb) if (as->status < 0 && as->bulk_addr && as->status != -ECONNRESET && as->status != -ENOENT) cancel_bulk_urbs(ps, as->bulk_addr); + + wake_up(&ps->wait); spin_unlock(&ps->lock); if (signr) { @@ -630,8 +632,6 @@ static void async_completed(struct urb *urb) put_pid(pid); put_cred(cred); } - - wake_up(&ps->wait); } static void destroy_async(struct usb_dev_state *ps, struct list_head *list) -- 2.14.0.434.g98096fd7a8-goog -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html