On Thu, 2012-11-08 at 14:25 +0000, Prantis, Kelsey wrote: > On 11/7/12 4:26 PM, "Prantis, Kelsey" <kelsey.prantis@xxxxxxxxx> wrote: > >>So I'll try to reproduce using some different combinations of > >>fabric_max_sectors, but for the moment please apply the following debug > >>patch to dump struct scatterlist + struct bio members and try to > >>reproduce. > > > >Am working on applying the patch now, and will get back to you when I have > >a result. > > There was quite a bit of output, which can be fetched from: > ftp://ftp.whamcloud.com/uploads/lio-debug.txt > Hi Kelsey, Thanks for the log. After an quick look, here's what I see so far: Start bio ffff880079806180 at 23304: [ 380.597511] IBLOCK: block_lba: 23304 sgl_nents: 256 CDB payload size is sgl_nents * PAGE_SIZE, so 1M here.. [ 380.597936] IBLOCK: sg[0]: ffff8800788b0000 addr: ffffea0001e5f100 length: 4096 offset: 0 [ 380.598623] IBLOCK: bio: ffff880079806180 bi_sector: 23304 bi_size: 4096 [ 380.599168] IBLOCK: sg[1]: ffff8800788b0020 addr: ffffea0000db49c0 length: 4096 offset: 0 [ 380.599843] IBLOCK: bio: ffff880079806180 bi_sector: 23304 bi_size: 8192 ..... [ 380.768537] IBLOCK: bio: ffff880079806180 bi_sector: 23304 bi_size: 520192 [ 380.769085] IBLOCK: sg[127]: ffff8800788b0fe0 addr: ffffea0001e24200 length: 4096 offset: 0 [ 380.769783] IBLOCK: bio: ffff880079806180 bi_sector: 23304 bi_size: 524288 [ 380.770407] IBLOCK: sg[128]: ffff8800788b1000 addr: ffffea0001e24240 length: 4096 offset: 0 [ 380.771073] IBLOCK: bio_add_page(), bio ffff880079806180 bio_cnt: 1 First bio is out of room, start second bio ffff880079fd7b40 at 24328: [ 380.775617] IBLOCK: bio: ffff880079fd7b40 bi_sector: 24328 bi_size: 4096 [ 380.776163] IBLOCK: sg[129]: ffff8800788b1020 addr: ffffea0001e24280 length: 4096 offset: 0 [ 380.776862] IBLOCK: bio: ffff880079fd7b40 bi_sector: 24328 bi_size: 8192 ..... [ 380.935985] IBLOCK: sg[254]: ffff8800788b1fc0 addr: ffffea0001e261c0 length: 4096 offset: 0 [ 380.936792] IBLOCK: bio: ffff880079fd7b40 bi_sector: 24328 bi_size: 520192 [ 380.937338] IBLOCK: sg[255]: ffff8800788b1fe0 addr: ffffea0001e26200 length: 4096 offset: 0 [ 380.938123] IBLOCK: bio: ffff880079fd7b40 bi_sector: 24328 bi_size: 524288 Until an eventual OOPs at: [ 381.123267] IBLOCK: bio: ffff880079806900 bi_sector: 2097136 bi_size: 4096 [ 381.123899] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8 [ 381.124258] IP: [<ffffffffa00ed09d>] iscsit_add_cmd_to_response_queue+0x8d/0xc0 [iscsi_target_mod] [ 381.124258] PGD 0 [ 381.124258] Oops: 0000 [#1] SMP [ 381.124258] Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables i2c_piix4 i2c_core virtio_net virtio_balloon microcode joydev virtio_blk [ 381.124258] CPU 0 [ 381.124258] Pid: 1000, comm: kworker/0:0 Not tainted 3.6.3-1.chromatestlab2.fc17.x86_64 #1 Red Hat KVM [ 381.124258] RIP: 0010:[<ffffffffa00ed09d>] [<ffffffffa00ed09d>] iscsit_add_cmd_to_response_queue+0x8d/0xc0 [iscsi_target_mod] [ 381.124258] RSP: 0018:ffff880036855d70 EFLAGS: 00010246 [ 381.124258] RAX: 0000000000000000 RBX: ffff88003710d000 RCX: 0000000000000000 [ 381.124258] RDX: ffff88003710d330 RSI: ffff88003710d330 RDI: ffffffffa00ed096 [ 381.124258] RBP: ffff880036855d90 R08: ffff88003710d330 R09: 0000000000000000 [ 381.124258] R10: ffff88007a03cee0 R11: ffff880079e0f5a0 R12: ffff88007b981300 [ 381.124258] R13: ffff88007a03cef0 R14: ffff88003710d2d0 R15: ffff88007fc0e490 [ 381.124258] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 381.124258] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 381.124258] CR2: 00000000000000f8 CR3: 000000007b24d000 CR4: 00000000000006f0 [ 381.124258] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 381.124258] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 381.124258] Process kworker/0:0 (pid: 1000, threadinfo ffff880036854000, task ffff88007b592e20) [ 381.124258] Stack: [ 381.124258] ffff88007b9816a8 ffff88007b981520 ffffe8ffffc00200 0000000000000000 [ 381.124258] ffff880036855da0 ffffffffa00f95d8 ffff880036855dc0 ffffffffa0095c60 [ 381.124258] ffff88007bf57280 ffff88007b9816a8 ffff880036855e30 ffffffff81077be7 [ 381.124258] Call Trace: [ 381.124258] [<ffffffffa00f95d8>] lio_queue_status+0x28/0x30 [iscsi_target_mod] [ 381.124258] [<ffffffffa0095c60>] target_complete_ok_work+0x190/0x200 [target_core_mod] [ 381.124258] [<ffffffff81077be7>] process_one_work+0x147/0x4a0 [ 381.124258] [<ffffffffa0095ad0>] ? target_complete_failure_work+0x20/0x20 [target_core_mod] [ 381.124258] [<ffffffff8107994e>] worker_thread+0x15e/0x480 [ 381.124258] [<ffffffff810797f0>] ? manage_workers+0x2f0/0x2f0 [ 381.124258] [<ffffffff8107ed13>] kthread+0x93/0xa0 [ 381.124258] [<ffffffff81626144>] kernel_thread_helper+0x4/0x10 [ 381.124258] [<ffffffff8107ec80>] ? kthread_freezable_should_stop+0x70/0x70 [ 381.124258] [<ffffffff81626140>] ? gs_change+0x13/0x13 [ 381.124258] Code: 38 03 00 00 48 8d 93 30 03 00 00 4c 89 ef e8 3b 01 20 e1 3e 41 ff 84 24 d4 00 00 00 4c 89 f7 e8 7a ff 52 e1 48 8b 83 70 03 00 00 <48> 8b b8 f8 00 00 00 e8 c7 42 fa e0 48 8b 5d e0 4c 8b 65 e8 4c [ 381.124258] RIP [<ffffffffa00ed09d>] iscsit_add_cmd_to_response_queue+0x8d/0xc0 [iscsi_target_mod] [ 381.124258] RSP <ffff880036855d70> [ 381.124258] CR2: 00000000000000f8 [ 381.147997] ---[ end trace e0be12745e6881c7 ]--- So AFAICT this looks like a iblock bio completion callback is being invoked more than once, that is being triggered when using multiple BIOs to complete a single se_cmd descriptor. Looking at the code in question from target_core_iblock.c, I think I see what's going on now. There is a double completion happening between the iblock_complete_cmd call in iblock_execute_rw() and iblock_bio_done() that appears to have been introduced with: commit 5787cacd0bd5ee016ad807b244550d34fe2beebe Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Apr 24 00:25:06 2012 -0400 target: remove struct se_task I'm currently testing the following patch to address this bug. Please give it a shot with your setup. Thanks! --nab diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c index 57d7674..305aa41 100644 --- a/drivers/target/target_core_iblock.c +++ b/drivers/target/target_core_iblock.c @@ -658,7 +658,7 @@ static int iblock_execute_rw(struct se_cmd *cmd) bio_list_init(&list); bio_list_add(&list, bio); - atomic_set(&ibr->pending, 2); + atomic_set(&ibr->pending, 1); bio_cnt = 1; for_each_sg(sgl, sg, sgl_nents, i) { @@ -689,7 +689,6 @@ static int iblock_execute_rw(struct se_cmd *cmd) } iblock_submit_bios(&list, rw); - iblock_complete_cmd(cmd); return 0; fail_put_bios: -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html