On 01/29/2016 04:07 PM, Mike Snitzer wrote: > On Fri, Jan 29 2016 at 1:42pm -0500, > Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote: >> On 01/28/2016 03:39 PM, Bart Van Assche wrote: >>> There is a regression in the 4.5-rc1 kernel with regard to multipath >>> setup. On my SRP I usually use for these tests after a few minutes a >>> kernel crash occurs and the console freezes. A screenshot has been attached. >> >> (replying to my own e-mail) > > Not sure where you sent your first email.. not seeing it on dm-devel > archives. > > So I don't have the original screenshot you attached. > > The 4.5 merge window didn't see any changes to DM mpath or DM core. So > any regression is very likely outside DM and rooted in SRP or whatever > other dependencies your setup relies on. Hello Mike, The behavior I see with kernel v4.5-rc3 is different of what I saw with v4.5-rc1 but it still is not the behavior I expect. The call trace that was triggered this morning on my test setup can be found below. I assume the information below means that the tio->ti->type is NULL in dm_done() ? Bart. BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 IP: [<ffffffffa00020e5>] dm_done+0x35/0x1b0 [dm_mod] PGD 456993067 PUD 40c76a067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: scsi_dh_alua dm_queue_length netconsole autofs4 ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm dm_round_robin dm_multipath iTCO_wdt iTCO_vendor_support ipmi_devintf dcdbas ipmi_si ipmi_msghandler sb_edac edac_core lpc_ich mfd_core tg3 libphy ptp pps_core sg wmi ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) ahci(E) libahci(E) mlx4_ib(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) ipv6(E) mlx4_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) CPU: 0 PID: 618 Comm: kworker/0:1H Tainted: G E 4.5.0-rc3+ #3 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014 Workqueue: kblockd blk_mq_run_work_fn task: ffff880437fa5e80 ti: ffff880437a6c000 task.ti: ffff880437a6c000 RIP: 0010:[<ffffffffa00020e5>] [<ffffffffa00020e5>] dm_done+0x35/0x1b0 [dm_mod] RSP: 0018:ffff88046e403e38 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8803f6a98d70 RCX: dead000000000200 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffc9000933c040 sd 23:0:0:1: Asymmetric access state changed device-mapper: multipath: Failing path 67:176. device-mapper: multipath: Failing path 68:16. sd 24:0:0:1: Asymmetric access state changed RBP: ffff88046e403e78 R08: ffff8803f6a98c78 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88006c0f2680 R13: ffff8803f6a98c00 R14: ffff88046e403ec8 R15: 0000000000000005 FS: 0000000000000000(0000) GS:ffff88046e400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000060 CR3: 000000041defd000 CR4: 00000000001406f0 Stack: 0000000000000003 0000000000000002 ffff88046e403e78 ffff8803f6a98d70 ffff8803f6a98c00 ffff8803f6a98c00 ffff88046e403ec8 0000000000000005 ffff88046e403ea8 ffffffffa00022ac ffffffff81a090e0 ffff8803f6a98c78 Call Trace: <IRQ> [<ffffffffa00022ac>] dm_softirq_done+0x4c/0xd0 [dm_mod] [<ffffffff812476ac>] blk_done_softirq+0x8c/0xb0 [<ffffffff8105be66>] __do_softirq+0xf6/0x240 [<ffffffff8105c0bc>] irq_exit+0xac/0xc0 [<ffffffff8103afde>] smp_call_function_single_interrupt+0x2e/0x40 [<ffffffff81535779>] call_function_single_interrupt+0x89/0x90 <EOI> [<ffffffff8153422d>] ? _raw_spin_unlock_irqrestore+0x3d/0x60 [<ffffffffa03515bc>] multipath_busy+0xcc/0xf0 [dm_multipath] [<ffffffffa00045bd>] dm_mq_queue_rq+0x7d/0x180 [dm_mod] [<ffffffff81249cdb>] __blk_mq_run_hw_queue+0x29b/0x490 [<ffffffff810a5fd3>] ? __lock_acquire+0x3b3/0x560 [<ffffffff81249f10>] blk_mq_run_work_fn+0x10/0x20 [<ffffffff810723ea>] process_one_work+0x1da/0x480 [<ffffffff8107237a>] ? process_one_work+0x16a/0x480 [<ffffffff810a62c4>] ? __lock_release+0xc4/0x3a0 [<ffffffff81072f39>] worker_thread+0x169/0x520 [<ffffffff81099d58>] ? complete+0x48/0x60 [<ffffffff8153422b>] ? _raw_spin_unlock_irqrestore+0x3b/0x60 [<ffffffff81072dd0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81072dd0>] ? maybe_create_worker+0x110/0x110 [<ffffffff8152ee92>] ? schedule+0x42/0xb0 [<ffffffff81072dd0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81078f94>] kthread+0xe4/0x100 [<ffffffff810a4dcd>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81081c99>] ? schedule_tail+0x19/0xd0 [<ffffffff81078eb0>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8153497f>] ret_from_fork+0x3f/0x70 [<ffffffff81078eb0>] ? __init_kthread_worker+0x70/0x70 Code: 65 e0 48 89 5d d8 49 89 fc 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 8b 9f 60 01 00 00 48 8b 7b 08 48 85 ff 74 0c 48 8b 47 08 84 d2 <4c> 8b 40 60 75 44 41 89 f5 41 83 fd 87 0f 84 f2 00 00 00 45 85 RIP [<ffffffffa00020e5>] dm_done+0x35/0x1b0 [dm_mod] RSP <ffff88046e403e38> CR2: 0000000000000060 ---[ end trace f47c39416952f73a ]--- sd 31:0:0:1: Asymmetric access state changed Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt $ gdb drivers/md/dm-mod.o (gdb) list *(dm_done+0x35) 0x20e5 is in dm_done (drivers/md/dm.c:1273). 1268 int r = error; 1269 struct dm_rq_target_io *tio = clone->end_io_data; 1270 dm_request_endio_fn rq_end_io = NULL; 1271 1272 if (tio->ti) { 1273 rq_end_io = tio->ti->type->rq_end_io; 1274 1275 if (mapped && rq_end_io) 1276 r = rq_end_io(tio->ti, clone, error, &tio->info); 1277 } -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel