Hi, I think there's a race in the cdc-acm driver which may cause the 'acm' object to be freed while it's still being used. This typically happens when the device is disconnected at roughly the same time as the tty device is closed. I've included the crash dump below, slighly redacted. The instructions leading up to the crash are: d4a: e8 00 00 00 00 callq d4f <acm_port_down+0x1f> d4b: R_X86_64_PC32 mutex_lock+0xfffffffffffffffc d4f: 48 83 3b 00 cmpq $0x0,(%rbx) d53: 0f 84 bd 00 00 00 je e16 <acm_port_down+0xe6> d59: 48 8b 43 08 mov 0x8(%rbx),%rax d5d: 48 8b 3b mov (%rbx),%rdi d60: 45 31 c0 xor %r8d,%r8d d63: c7 83 fc 06 00 00 00 movl $0x0,0x6fc(%rbx) d6a: 00 00 00 d6d: b9 21 00 00 00 mov $0x21,%ecx d72: ba 22 00 00 00 mov $0x22,%edx d77: 45 31 e4 xor %r12d,%r12d d7a: 48 8b 00 mov (%rax),%rax At this point, %rbx contains acm, %rax contains acm->control and %rdi contains acm->dev. The register dump below shows that both %rax and %rdi contain 0x6b6b6b... which is the POISON_FREE pattern (I'm running with SLAB debugging enabled). So acm must have been freed at some point before this. What seems to be happening is that acm_tty_close() is called in parallel with acm_disconnect(). tty_port_close_start() runs, decrements acm->port.count to 0, and returns 1. In the mean time, acm_disconnect() grabs the open_mutex so that acm_port_down() has to wait. Since acm->port.count has reached 0, it calls acm_tty_unregister(), which frees acm, and unlocks the mutex. At this point, acm_port_down() gets hold of open_mutex and since acm has been freed and poisoned, acm->dev will be non-NULL so it will continue using acm and eventually explode when trying to dereference one of the pointers. This race is very difficult to trigger. SLAB debugging makes it somewhat easier (i.e. you don't have to trigger the race _and_ have someone else allocate the old acm object and initialize it with unrelated data). To confirm my analysis, I added a call to mdelay(200) in acm_tty_close() right before the call to acm_port_down(), and another mdelay(100) at the beginning of acm_disconnect(), essentially making the suspected race window much larger. This makes the bug almost 100% reproducible. I would like some suggestions on the best way to fix this. One way might be to hold open_mutex throughout acm_tty_close(). I'm not sure if this is safe with regards to what tty_port_close_start() might do, and I kind of feel open_mutex has been abused enough already. Another way would be to make use of the kref embedded inside acm->port and give the tty side and the usb side a reference each, freeing the object when both are released. This seems cleaner, but I'm very unsure about where the appropriate place to drop the tty reference would be. Thoughts? Btw, it looks like a very similar issue has been reported before: https://lkml.org/lkml/2011/5/29/64 Best regards, Havard Skinnemoen [ 1723.680854] general protection fault: 0000 [#1] SMP [ 1723.681006] CPU 0 [ 1723.681006] Modules linked in: cdc_acm snd_hda_codec_hdmi snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd snd_page_alloc it8712f_wdt msr cpuid e1000e ipv6 genrtc [ 1723.681006] [ 1723.681006] Pid: 11739, comm: acmtest Not tainted 3.1.0-custom #4 [ 1723.681006] RIP: 0010:[<ffffffffa0079d7a>] [<ffffffffa0079d7a>] acm_port_down+0x4a/0x100 [cdc_acm] [ 1723.681006] RSP: 0018:ffff880009c5dc98 EFLAGS: 00010246 [ 1723.681006] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88002d2aab20 RCX: 0000000000000021 [ 1723.681006] RDX: 0000000000000022 RSI: 000000000481b330 RDI: 6b6b6b6b6b6b6b6b [ 1723.681006] RBP: ffff880009c5dcc8 R08: 0000000000000000 R09: 0000000000000000 [ 1723.681006] R10: ffff8800286eed38 R11: 0000000000000000 R12: 0000000000000000 [ 1723.681006] R13: ffff88002d2aab38 R14: ffffffffffffffff R15: 0000000000000000 [ 1723.681006] FS: 00000000008c2850(0000) GS:ffff88002fc00000(0000) knlGS:00000000f64466d0 [ 1723.681006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1723.681006] CR2: 0000000000467a20 CR3: 0000000001806000 CR4: 00000000000006f0 [ 1723.681006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1723.681006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1723.681006] Process acmtest (pid: 11739, threadinfo ffff880009c5c000, task ffff880000172000) [ 1723.681006] Stack: [ 1723.681006] 0000000000000246 ffff880009c5dcb8 0000000000000018 ffff88002d2aab38 [ 1723.681006] ffff88002d2aab20 ffff88002ac630b8 ffff880009c5dcf8 ffffffffa0079ebc [ 1723.681006] ffff88002ac630b8 ffff8800286eed28 0000000000000000 ffff880009d143e0 [ 1723.681006] Call Trace: [ 1723.681006] [<ffffffffa0079ebc>] acm_tty_close+0x5c/0xd0 [cdc_acm] [ 1723.681006] [<ffffffff813835b9>] tty_release+0x149/0x820 [ 1723.681006] [<ffffffff8117ee70>] ? exit_mmap+0x120/0x190 [ 1723.681006] [<ffffffff811a4139>] ? fput+0x199/0x2b0 [ 1723.681006] [<ffffffff811a405e>] fput+0xbe/0x2b0 [ 1723.681006] [<ffffffff8119f953>] filp_close+0x63/0x90 [ 1723.681006] [<ffffffff810a0d1f>] put_files_struct+0x7f/0xf0 [ 1723.681006] [<ffffffff810a29c7>] do_exit+0x1c7/0xed0 [ 1723.681006] [<ffffffff811b8539>] ? sys_select+0x59/0x110 [ 1723.681006] [<ffffffff810a37e4>] sys_exit_group+0x54/0xc0 [ 1723.681006] [<ffffffff815b766b>] system_call_fastpath+0x16/0x1b [ 1723.681006] Code: 48 83 3b 00 0f 84 bd 00 00 00 48 8b 43 08 48 8b 3b 45 31 c0 c7 83 fc 06 00 00 00 00 00 00 b9 21 00 00 00 ba 22 00 00 00 45 31 e4 [ 1723.681006] 8b 00 8b 37 44 0f b6 48 02 c1 e6 08 c7 44 24 10 88 13 00 00 [ 1723.681006] RIP [<ffffffffa0079d7a>] acm_port_down+0x4a/0x100 [cdc_acm] [ 1723.681006] RSP <ffff880009c5dc98> [ 1723.932348] ---[ end trace a5c0598cb15eda99 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html