On Thu, 2012-11-15 at 15:50 -0800, Nicholas A. Bellinger wrote: > On Thu, 2012-11-15 at 21:26 +0000, Prantis, Kelsey wrote: > > On 11/13/12 2:22 PM, "Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> wrote: > > > > <SNIP> > > > Hi Nicholas, > > > > Sorry for the delay. The new debug output with your latest patch (and typo > > adjustment) is up at ftp://ftp.whamcloud.com/uploads/lio-debug-4.txt.bz2 > > > > Hi Kelsey, > > Thanks alot for this new dump. It appears you where able to trigger one > of the extra BUG_ON(!cmd) checks added in iblock_bio_destructor(): > <SNIP> > So it appears that iblock_bio_destructor()'s bio->bi_private assignment > for the saved se_cmd pointer, and likely the leading se_cmd reference > from the same ->bi_private in iblock_bio_done() is being cleared and/or > stomped upon by something ahead of the callback into IBLOCK code.. > > Still not exactly sure what's going on here, but I'm starting to lean > toward some type of virtio-blk w/ IBLOCK specific bug that overflow's > something in virtio with large requests. In any event, I'll spend some > time with with virtio-blk + IBLOCK exports next, and try to reproduce on > v3.7-rc code. > So after spending some time with virtio-blk <-> IBLOCK export in KVM guest this evening, I'm now able to reproduce the same OOPs you've been observing with iscsi-target LUNs, using tcm_loop LUNs w/ explicit max_sectors_kb=8192 settings in order to generate the largish (16384 sector * 512 byte block = 8M) requests with virtio-blk backend export. Thus far the bug is occurring during fio write-verify tests w/ blocksize=8M with the tcm_loop SCSI LUN (/dev/sdX). For reference, below is the kernel log. So I'll keep poking at this tomorrow, and will (hopefully) get this bug identified ASAP. Thank again Kelsey! --nab [ 2318.466923] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2318.467701] IP: [<ffffffffa04dc482>] iblock_complete_cmd+0x8/0x33 [target_core_iblock] [ 2318.467701] PGD 3792b067 PUD 7ba7b067 PMD 0 [ 2318.467701] Oops: 0002 [#1] SMP [ 2318.467701] Modules linked in: ib_srpt tcm_qla2xxx tcm_loop tcm_fc iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod ib_cm ib_sa ib_mad ib_core qla2xxx libfc scsi_transport_fc configfs ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi e1000 virtio_blk sr_mod cdrom sd_mod virtio_pci virtio_ring virtio ata_piix libata [last unloaded: target_core_mod] [ 2318.467701] CPU 3 [ 2318.467701] Pid: 8512, comm: kworker/3:33 Not tainted 3.6.0-rc6+ #4 Bochs Bochs [ 2318.467701] RIP: 0010:[<ffffffffa04dc482>] [<ffffffffa04dc482>] iblock_complete_cmd+0x8/0x33 [target_core_iblock] [ 2318.467701] RSP: 0018:ffff88007aff3d10 EFLAGS: 00010282 [ 2318.467701] RAX: ffff88004a656d70 RBX: 0000000000000000 RCX: ffff88007b6035f0 [ 2318.467701] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffff88007b04ca90 [ 2318.467701] RBP: ffff88003ddba000 R08: 0000000000000000 R09: 8000000000000000 [ 2318.467701] R10: 0000000000000001 R11: 0000000000046000 R12: 0000000002698000 [ 2318.467701] R13: ffff8800379d9dc0 R14: 0000000000000000 R15: 0000000000000001 [ 2318.467701] FS: 0000000000000000(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 [ 2318.467701] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2318.467701] CR2: 0000000000000000 CR3: 0000000037b35000 CR4: 00000000000006e0 [ 2318.467701] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2318.467701] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2318.467701] Process kworker/3:33 (pid: 8512, threadinfo ffff88007aff2000, task ffff88004a656d70) [ 2318.467701] Stack: [ 2318.467701] ffff88007b04ca90 ffffffffa04dcd8d 0000001000000800 0000000000000800 [ 2318.467701] 0000000000000000 0000000000000000 0000000000000246 ffff88007b04ca90 [ 2318.467701] ffff88007b04cbb0 ffff88007ad0f300 ffffffffa053a37a 0000000000000000 [ 2318.467701] Call Trace: [ 2318.467701] [<ffffffffa04dcd8d>] ? iblock_execute_rw+0x21f/0x278 [target_core_iblock] [ 2318.467701] [<ffffffffa053a37a>] ? tcm_loop_queue_status+0x96/0x96 [tcm_loop] [ 2318.467701] [<ffffffffa04a7103>] ? __target_execute_cmd+0x41/0x7a [target_core_mod] [ 2318.467701] [<ffffffffa053a8f3>] ? tcm_loop_write_pending+0x9/0x10 [tcm_loop] [ 2318.467701] [<ffffffffa053a37a>] ? tcm_loop_queue_status+0x96/0x96 [tcm_loop] [ 2318.467701] [<ffffffffa04a9181>] ? transport_generic_new_cmd+0x24f/0x2b1 [target_core_mod] [ 2318.467701] [<ffffffffa053a37a>] ? tcm_loop_queue_status+0x96/0x96 [tcm_loop] [ 2318.467701] [<ffffffffa04a9243>] ? transport_handle_cdb_direct+0x60/0x74 [target_core_mod] [ 2318.467701] [<ffffffff81041621>] ? process_one_work+0x1cd/0x2ae [ 2318.467701] [<ffffffff810436d2>] ? worker_thread+0x22d/0x32c [ 2318.467701] [<ffffffff810434a5>] ? manage_workers+0x254/0x254 [ 2318.467701] [<ffffffff810434a5>] ? manage_workers+0x254/0x254 [ 2318.467701] [<ffffffff81046c0f>] ? kthread+0x81/0x89 [ 2318.467701] [<ffffffff81385ac4>] ? kernel_thread_helper+0x4/0x10 [ 2318.467701] [<ffffffff81046b8e>] ? kthread_freezable_should_stop+0x53/0x53 [ 2318.467701] [<ffffffff81385ac0>] ? gs_change+0x13/0x13 [ 2318.467701] Code: 48 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 83 ec 08 48 c7 c6 20 d5 4d a0 48 83 c4 08 e9 78 d0 fc ff 53 48 8b 9f 08 02 00 00 <f0> ff 0b 0f 94 c0 84 c0 74 1f 8b 43 04 83 f8 01 19 f6 f7 d6 83 [ 2318.467701] RIP [<ffffffffa04dc482>] iblock_complete_cmd+0x8/0x33 [target_core_iblock] [ 2318.467701] RSP <ffff88007aff3d10> [ 2318.467701] CR2: 0000000000000000 [ 2318.531925] ---[ end trace ea423a9e4f3b8654 ]--- [ 2318.532803] BUG: unable to handle kernel paging request at fffffffffffffff8 [ 2318.533757] IP: [<ffffffff810467cb>] kthread_data+0x7/0xc [ 2318.533757] PGD 14e9067 PUD 14ea067 PMD 0 [ 2318.533757] Oops: 0000 [#2] SMP [ 2318.533757] Modules linked in: ib_srpt tcm_qla2xxx tcm_loop tcm_fc iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod ib_cm ib_sa ib_mad ib_core qla2xxx libfc scsi_transport_fc configfs ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi e1000 virtio_blk sr_mod cdrom sd_mod virtio_pci virtio_ring virtio ata_piix libata [last unloaded: target_core_mod] [ 2318.533757] CPU 3 [ 2318.533757] Pid: 8512, comm: kworker/3:33 Tainted: G D 3.6.0-rc6+ #4 Bochs Bochs [ 2318.533757] RIP: 0010:[<ffffffff810467cb>] [<ffffffff810467cb>] kthread_data+0x7/0xc [ 2318.533757] RSP: 0018:ffff88007aff39b0 EFLAGS: 00010002 [ 2318.533757] RAX: 0000000000000000 RBX: ffff88007fd91080 RCX: 0000000000000003 [ 2318.533757] RDX: ffffffff81659220 RSI: 0000000000000003 RDI: ffff88004a656d70 [ 2318.533757] RBP: ffff88004a656d70 R08: 0000000000000400 R09: ffffffff810340bf [ 2318.533757] R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000003 [ 2318.533757] R13: ffff88007aff3a88 R14: 0000000000000003 R15: 0000000000000001 [ 2318.533757] FS: 0000000000000000(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 [ 2318.533757] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2318.533757] CR2: fffffffffffffff8 CR3: 0000000037b35000 CR4: 00000000000006e0 [ 2318.533757] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2318.533757] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2318.533757] Process kworker/3:33 (pid: 8512, threadinfo ffff88007aff2000, task ffff88004a656d70) [ 2318.533757] Stack: [ 2318.533757] ffffffff81042fb7 ffff88007fd91080 ffff88004a656d70 ffff88004a657028 [ 2318.533757] ffffffff8137e5fb ffff88004a656d70 ffff88004a656d70 0000000000011080 [ 2318.566589] ffff88007aff3fd8 0000000000011080 ffff88007aff3fd8 0000000000011080 [ 2318.566589] Call Trace: [ 2318.566589] [<ffffffff81042fb7>] ? wq_worker_sleeping+0xc/0x65 [ 2318.570534] [<ffffffff8137e5fb>] ? __schedule+0x147/0x470 [ 2318.570534] [<ffffffff8103410b>] ? do_exit+0x7fd/0x801 [ 2318.570534] [<ffffffff8137d7a0>] ? printk+0x4c/0x51 [ 2318.570534] [<ffffffff813803ef>] ? oops_end+0xb1/0xb6 [ 2318.570534] [<ffffffff81025fb3>] ? no_context+0x254/0x263 [ 2318.570534] [<ffffffff81382545>] ? do_page_fault+0x1ad/0x34c [ 2318.570534] [<ffffffff811a5d4f>] ? blk_recount_segments+0x16/0x25 [ 2318.570534] [<ffffffff811a1006>] ? drive_stat_acct+0xfb/0x137 [ 2318.578642] [<ffffffff811a3075>] ? blk_queue_bio+0x269/0x2bf [ 2318.578642] [<ffffffff8137f9a5>] ? async_page_fault+0x25/0x30 [ 2318.578642] [<ffffffffa04dc482>] ? iblock_complete_cmd+0x8/0x33 [target_core_iblock] [ 2318.578642] [<ffffffffa04dcd8d>] ? iblock_execute_rw+0x21f/0x278 [target_core_iblock] [ 2318.578642] [<ffffffffa053a37a>] ? tcm_loop_queue_status+0x96/0x96 [tcm_loop] [ 2318.578642] [<ffffffffa04a7103>] ? __target_execute_cmd+0x41/0x7a [target_core_mod] [ 2318.578642] [<ffffffffa053a8f3>] ? tcm_loop_write_pending+0x9/0x10 [tcm_loop] [ 2318.586625] [<ffffffffa053a37a>] ? tcm_loop_queue_status+0x96/0x96 [tcm_loop] [ 2318.586625] [<ffffffffa04a9181>] ? transport_generic_new_cmd+0x24f/0x2b1 [target_core_mod] [ 2318.586625] [<ffffffffa053a37a>] ? tcm_loop_queue_status+0x96/0x96 [tcm_loop] [ 2318.590543] [<ffffffffa04a9243>] ? transport_handle_cdb_direct+0x60/0x74 [target_core_mod] [ 2318.590543] [<ffffffff81041621>] ? process_one_work+0x1cd/0x2ae [ 2318.590543] [<ffffffff810436d2>] ? worker_thread+0x22d/0x32c [ 2318.594606] [<ffffffff810434a5>] ? manage_workers+0x254/0x254 [ 2318.594606] [<ffffffff810434a5>] ? manage_workers+0x254/0x254 [ 2318.594606] [<ffffffff81046c0f>] ? kthread+0x81/0x89 [ 2318.594606] [<ffffffff81385ac4>] ? kernel_thread_helper+0x4/0x10 [ 2318.594606] [<ffffffff81046b8e>] ? kthread_freezable_should_stop+0x53/0x53 [ 2318.594606] [<ffffffff81385ac0>] ? gs_change+0x13/0x13 [ 2318.594606] Code: 89 e2 fe ff ff eb dc 41 bf ea ff ff ff eb 89 90 65 48 8b 04 25 40 b7 00 00 48 8b 80 60 02 00 00 8b 40 f0 c3 48 8b 87 60 02 00 00 <48> 8b 40 f8 c3 48 3b 3d b9 29 61 00 75 08 0f bf 87 22 06 00 00 [ 2318.594606] RIP [<ffffffff810467cb>] kthread_data+0x7/0xc [ 2318.594606] RSP <ffff88007aff39b0> [ 2318.594606] CR2: fffffffffffffff8 [ 2318.594606] ---[ end trace ea423a9e4f3b8655 ]--- [ 2318.594606] Fixing recursive fault but reboot is needed! [ 2319.498258] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [ 2319.499233] IP: [<ffffffffa04dc87f>] iblock_bio_destructor+0xc/0x20 [target_core_iblock] [ 2319.499233] PGD 7bb7f067 PUD 7b066067 PMD 0 [ 2319.499233] Oops: 0000 [#3] SMP [ 2319.499233] Modules linked in: ib_srpt tcm_qla2xxx tcm_loop tcm_fc iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod ib_cm ib_sa ib_mad ib_core qla2xxx libfc scsi_transport_fc configfs ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi e1000 virtio_blk sr_mod cdrom sd_mod virtio_pci virtio_ring virtio ata_piix libata [last unloaded: target_core_mod] [ 2319.499233] CPU 0 [ 2319.499233] Pid: 0, comm: swapper/0 Tainted: G D 3.6.0-rc6+ #4 Bochs Bochs [ 2319.499233] RIP: 0010:[<ffffffffa04dc87f>] [<ffffffffa04dc87f>] iblock_bio_destructor+0xc/0x20 [target_core_iblock] [ 2319.499233] RSP: 0018:ffff88007fc03dc8 EFLAGS: 00010092 [ 2319.499233] RAX: 0000000000000000 RBX: ffff8800378f43c0 RCX: 0000000000080000 [ 2319.499233] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800378f43c0 [ 2319.499233] RBP: ffff88007b04ca90 R08: 0000000000013d20 R09: ffff88007d002700 [ 2319.499233] R10: ffffffffa00410f9 R11: 0000000000000293 R12: 0000000000000000 [ 2319.499233] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 2319.499233] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 2319.499233] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2319.499233] CR2: 0000000000000028 CR3: 000000007b70a000 CR4: 00000000000006f0 [ 2319.499233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2319.499233] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2319.499233] Process swapper/0 (pid: 0, threadinfo ffffffff814dc000, task ffffffff814ef410) [ 2319.499233] Stack: [ 2319.499233] ffff88000000003b ffffffffa04dc7e4 ffff88007b7e5c38 ffff8800378f43c0 [ 2319.499233] 0000000000080000 ffffffff811a0cc6 ffff880000080000 000000007af106f8 [ 2319.499233] ffff880037894000 ffff88007b7e5c38 0000000000000000 0000000000000000 [ 2319.499233] Call Trace: [ 2319.499233] <IRQ> [ 2319.499233] [<ffffffffa04dc7e4>] ? iblock_bio_done+0x4b/0x57 [target_core_iblock] [ 2319.499233] [<ffffffff811a0cc6>] ? blk_update_request+0x171/0x32b [ 2319.499233] [<ffffffff811a0e91>] ? blk_update_bidi_request+0x11/0x5b [ 2319.499233] [<ffffffff811a1b25>] ? __blk_end_bidi_request+0x10/0x2b [ 2319.499233] [<ffffffff811a1b5d>] ? __blk_end_request_all+0x1d/0x2a [ 2319.499233] [<ffffffffa00671c2>] ? blk_done+0x98/0xf1 [virtio_blk] [ 2319.499233] [<ffffffffa0041286>] ? vring_interrupt+0x6f/0x76 [virtio_ring] [ 2319.499233] [<ffffffff8108448c>] ? handle_irq_event_percpu+0x2d/0x130 [ 2319.499233] [<ffffffff810845bd>] ? handle_irq_event+0x2e/0x4c [ 2319.499233] [<ffffffff8108694f>] ? handle_edge_irq+0x98/0xb9 [ 2319.499233] [<ffffffff81003aa7>] ? handle_irq+0x17/0x20 [ 2319.499233] [<ffffffff810032da>] ? do_IRQ+0x45/0xad [ 2319.499233] [<ffffffff8137f72a>] ? common_interrupt+0x6a/0x6a [ 2319.499233] <EOI> [ 2319.499233] [<ffffffff810229fc>] ? native_safe_halt+0x2/0x3 [ 2319.499233] [<ffffffff8100887e>] ? default_idle+0x23/0x3f [ 2319.499233] [<ffffffff81008b02>] ? cpu_idle+0x6b/0xaa [ 2319.499233] [<ffffffff81563c43>] ? start_kernel+0x347/0x352 [ 2319.499233] [<ffffffff81563731>] ? kernel_init+0x1c3/0x1c3 [ 2319.499233] [<ffffffff815633c3>] ? x86_64_start_kernel+0x102/0x10f [ 2319.499233] Code: 60 73 c8 4d a0 49 c7 45 50 99 c7 4d a0 4d 89 65 00 48 83 c4 08 4c 89 e8 5b 5d 41 5c 41 5d c3 48 83 ec 08 48 8b 47 58 48 8b 40 70 <48> 8b 40 28 48 8b b0 08 02 00 00 48 83 c4 08 e9 29 59 c1 e0 55 [ 2319.499233] RIP [<ffffffffa04dc87f>] iblock_bio_destructor+0xc/0x20 [target_core_iblock] [ 2319.499233] RSP <ffff88007fc03dc8> [ 2319.499233] CR2: 0000000000000028 [ 2319.499233] ---[ end trace ea423a9e4f3b8656 ]--- [ 2319.499233] Kernel panic - not syncing: Fatal exception in interrupt [ 2319.499233] Shutting down cpus with NMI -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html