On Mon, 2019-03-25 at 10:26 +-0100, Hannes Reinecke wrote: +AD4 The original issue leading to this patchset was this crash: +AD4 +AD4 +AD4 +AFs-159135.508116+AF0 Pid: 2638, comm: ssea Tainted: G W X +AD4 3.0.101-0.40-default +ACM-1 HP ProLiant BL460c Gen9 +AD4 +AFs-159135.508119+AF0 RIP: 0010:+AFsAPA-ffffffffa00bb5d1+AD4AXQ +AFsAPA-ffffffffa00bb5d1+AD4AXQ +AD4 scsi+AF8-device+AF8-get+-0x11/0xb0 +AFs-scsi+AF8-mod+AF0 +AD4 +AFs-159135.508126+AF0 RSP: 0018:ffff88100fdf5c88 EFLAGS: 00010296 +AD4 +AFs-159135.508128+AF0 RAX: ffff88101b31d5c0 RBX: 0000000000000000 RCX: +AD4 ffff88101b31d5c0 +AD4 +AFs-159135.508130+AF0 RDX: 0000000000000000 RSI: 0000000000000002 RDI: +AD4 0000000000000000 +AD4 +AFs-159135.508132+AF0 RBP: ffff88101c1c4780 R08: 0000000000000000 R09: +AD4 ffff88201f387af0 +AD4 +AFs-159135.508134+AF0 R10: ffff88100fdf5e68 R11: ffffffff811eee70 R12: +AD4 ffffffffa06ea120 +AD4 +AFs-159135.508136+AF0 R13: ffff88201ef903c0 R14: ffff881007bdda00 R15: +AD4 ffff88101c1c4780 +AD4 +AFs-159135.508139+AF0 FS: 00007faae06d2700(0000) GS:ffff88107fc00000(0000) +AD4 knlGS:0000000000000000 +AD4 +AFs-159135.508141+AF0 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 +AD4 +AFs-159135.508143+AF0 CR2: 0000000000000650 CR3: 0000001018d0f000 CR4: +AD4 00000000001407f0 +AD4 +AFs-159135.508145+AF0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: +AD4 0000000000000000 +AD4 +AFs-159135.508148+AF0 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: +AD4 0000000000000400 +AD4 +AFs-159135.508150+AF0 Process ssea (pid: 2638, threadinfo ffff88100fdf4000, +AD4 task ffff881012080140) +AD4 +AFs-159135.508152+AF0 Stack: +AD4 +AFs-159135.508153+AF0 ffff88201ef903c0 ffff88101b31d5c0 ffff88101c1c4780 +AD4 ffffffffa06e767c +AD4 +AFs-159135.508160+AF0 0000000000000000 0000000000000000 0000000000000000 +AD4 ffffffff8116119e +AD4 +AFs-159135.508163+AF0 0000000000000000 0000000000000000 0000000000000000 +AD4 0000000000000000 +AD4 +AFs-159135.508167+AF0 Call Trace: +AD4 +AFs-159135.508177+AF0 +AFsAPA-ffffffffa06e767c+AD4AXQ ch+AF8-open+-0x4c/0xa0 +AFs-ch+AF0 +AD4 +AFs-159135.508189+AF0 +AFsAPA-ffffffff8116119e+AD4AXQ chrdev+AF8-open+-0x13e/0x200 +AD4 +AFs-159135.508196+AF0 +AFsAPA-ffffffff8115ade8+AD4AXQ +AF8AXw-dentry+AF8-open+-0x198/0x310 +AD4 +AFs-159135.508201+AF0 +AFsAPA-ffffffff8116a432+AD4AXQ do+AF8-last+-0x1f2/0x800 +AD4 +AFs-159135.508206+AF0 +AFsAPA-ffffffff8116b6a9+AD4AXQ path+AF8-openat+-0xd9/0x420 +AD4 +AFs-159135.508210+AF0 +AFsAPA-ffffffff8116bb2c+AD4AXQ do+AF8-filp+AF8-open+-0x4c/0xc0 +AD4 +AFs-159135.508214+AF0 +AFsAPA-ffffffff8115c7cf+AD4AXQ do+AF8-sys+AF8-open+-0x17f/0x250 +AD4 +AFs-159135.508219+AF0 +AFsAPA-ffffffff8146c292+AD4AXQ system+AF8-call+AF8-fastpath+-0x16/0x1b +AD4 +AFs-159135.508225+AF0 +AFsAPA-00007faadfa2a040+AD4AXQ 0x7faadfa2a03f +AD4 +AFs-159135.508227+AF0 Code: 56 27 e1 0f 1f 80 00 00 00 00 48 89 df e8 98 47 fe +AD4 e0 eb d5 66 0f 1f 44 00 00 48 83 ec 18 48 89 5c 24 08 48 89 6c 24 10 48 +AD4 89 fb +AD4 83+AD4AWw-159135.508241+AF0 bf 50 06 00 00 04 75 16 b8 fa ff ff ff 48 8b 5c 24 +AD4 08 48 8b +AD4 +AFs-159135.508248+AF0 RIP +AFsAPA-ffffffffa00bb5d1+AD4AXQ scsi+AF8-device+AF8-get+-0x11/0xb0 +AD4 +AFs-scsi+AF8-mod+AF0 +AD4 +AFs-159135.508254+AF0 RSP +ADw-ffff88100fdf5c88+AD4 +AD4 +AFs-159135.508256+AF0 CR2: 0000000000000650 +AD4 +AD4 And we had been crashing because 'ch-+AD4-device' was NULL in ch+AF8-open(). +AD4 This patch is to guarantee atomicity on 'scsi+AF8-device+AF8-put()' and +AD4 'ch-+AD4-device +AD0 NULL'+ADs otherwise we'd be having a race window between +AD4 those calls, allowing another thread to find a 'ch' device with an +AD4 invalid but non-NULL ch-+AD4-device pointer. Hi Hannes, Thank you for having shared this call trace. Do you agree that moving the ch-+AD4-device +AD0 NULL assignment from ch+AF8-release() into ch+AF8-destroy() is sufficient to fix this crash? Bart.