Re: Too big sectors - exceeding fabric_max_sectors

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Thu, 08 Nov 2012 10:10:24 -0800

On Thu, 2012-11-08 at 14:25 +0000, Prantis, Kelsey wrote:
> On 11/7/12 4:26 PM, "Prantis, Kelsey" <kelsey.prantis@xxxxxxxxx> wrote:
> >>So I'll try to reproduce using some different combinations of
> >>fabric_max_sectors, but for the moment please apply the following debug
> >>patch to dump struct scatterlist + struct bio members and try to
> >>reproduce.
> >
> >Am working on applying the patch now, and will get back to you when I have
> >a result.
> 
> There was quite a bit of output, which can be fetched from:
> ftp://ftp.whamcloud.com/uploads/lio-debug.txt
> 

Hi Kelsey,

Thanks for the log.  After an quick look, here's what I see so far:

Start bio ffff880079806180 at 23304:

[  380.597511] IBLOCK: block_lba: 23304 sgl_nents: 256

CDB payload size is sgl_nents * PAGE_SIZE, so 1M here..

[  380.597936] IBLOCK: sg[0]: ffff8800788b0000 addr: ffffea0001e5f100 length: 4096 offset: 0
[  380.598623] IBLOCK: bio: ffff880079806180 bi_sector: 23304 bi_size: 4096
[  380.599168] IBLOCK: sg[1]: ffff8800788b0020 addr: ffffea0000db49c0 length: 4096 offset: 0
[  380.599843] IBLOCK: bio: ffff880079806180 bi_sector: 23304 bi_size: 8192

.....

[  380.768537] IBLOCK: bio: ffff880079806180 bi_sector: 23304 bi_size: 520192
[  380.769085] IBLOCK: sg[127]: ffff8800788b0fe0 addr: ffffea0001e24200 length: 4096 offset: 0
[  380.769783] IBLOCK: bio: ffff880079806180 bi_sector: 23304 bi_size: 524288
[  380.770407] IBLOCK: sg[128]: ffff8800788b1000 addr: ffffea0001e24240 length: 4096 offset: 0
[  380.771073] IBLOCK: bio_add_page(), bio ffff880079806180 bio_cnt: 1

First bio is out of room, start second bio ffff880079fd7b40 at 24328:

[  380.775617] IBLOCK: bio: ffff880079fd7b40 bi_sector: 24328 bi_size: 4096
[  380.776163] IBLOCK: sg[129]: ffff8800788b1020 addr: ffffea0001e24280 length: 4096 offset: 0
[  380.776862] IBLOCK: bio: ffff880079fd7b40 bi_sector: 24328 bi_size: 8192

.....

[  380.935985] IBLOCK: sg[254]: ffff8800788b1fc0 addr: ffffea0001e261c0 length: 4096 offset: 0
[  380.936792] IBLOCK: bio: ffff880079fd7b40 bi_sector: 24328 bi_size: 520192
[  380.937338] IBLOCK: sg[255]: ffff8800788b1fe0 addr: ffffea0001e26200 length: 4096 offset: 0
[  380.938123] IBLOCK: bio: ffff880079fd7b40 bi_sector: 24328 bi_size: 524288

Until an eventual OOPs at:

[  381.123267] IBLOCK: bio: ffff880079806900 bi_sector: 2097136 bi_size: 4096
[  381.123899] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
[  381.124258] IP: [<ffffffffa00ed09d>] iscsit_add_cmd_to_response_queue+0x8d/0xc0 [iscsi_target_mod]
[  381.124258] PGD 0
[  381.124258] Oops: 0000 [#1] SMP
[  381.124258] Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables i2c_piix4 i2c_core virtio_net virtio_balloon microcode joydev virtio_blk
[  381.124258] CPU 0
[  381.124258] Pid: 1000, comm: kworker/0:0 Not tainted 3.6.3-1.chromatestlab2.fc17.x86_64 #1 Red Hat KVM
[  381.124258] RIP: 0010:[<ffffffffa00ed09d>]  [<ffffffffa00ed09d>] iscsit_add_cmd_to_response_queue+0x8d/0xc0 [iscsi_target_mod]
[  381.124258] RSP: 0018:ffff880036855d70  EFLAGS: 00010246
[  381.124258] RAX: 0000000000000000 RBX: ffff88003710d000 RCX: 0000000000000000
[  381.124258] RDX: ffff88003710d330 RSI: ffff88003710d330 RDI: ffffffffa00ed096
[  381.124258] RBP: ffff880036855d90 R08: ffff88003710d330 R09: 0000000000000000
[  381.124258] R10: ffff88007a03cee0 R11: ffff880079e0f5a0 R12: ffff88007b981300
[  381.124258] R13: ffff88007a03cef0 R14: ffff88003710d2d0 R15: ffff88007fc0e490
[  381.124258] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[  381.124258] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  381.124258] CR2: 00000000000000f8 CR3: 000000007b24d000 CR4: 00000000000006f0
[  381.124258] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  381.124258] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  381.124258] Process kworker/0:0 (pid: 1000, threadinfo ffff880036854000, task ffff88007b592e20)
[  381.124258] Stack:
[  381.124258]  ffff88007b9816a8 ffff88007b981520 ffffe8ffffc00200 0000000000000000
[  381.124258]  ffff880036855da0 ffffffffa00f95d8 ffff880036855dc0 ffffffffa0095c60
[  381.124258]  ffff88007bf57280 ffff88007b9816a8 ffff880036855e30 ffffffff81077be7
[  381.124258] Call Trace:
[  381.124258]  [<ffffffffa00f95d8>] lio_queue_status+0x28/0x30 [iscsi_target_mod]
[  381.124258]  [<ffffffffa0095c60>] target_complete_ok_work+0x190/0x200 [target_core_mod]
[  381.124258]  [<ffffffff81077be7>] process_one_work+0x147/0x4a0
[  381.124258]  [<ffffffffa0095ad0>] ? target_complete_failure_work+0x20/0x20 [target_core_mod]
[  381.124258]  [<ffffffff8107994e>] worker_thread+0x15e/0x480
[  381.124258]  [<ffffffff810797f0>] ? manage_workers+0x2f0/0x2f0
[  381.124258]  [<ffffffff8107ed13>] kthread+0x93/0xa0
[  381.124258]  [<ffffffff81626144>] kernel_thread_helper+0x4/0x10
[  381.124258]  [<ffffffff8107ec80>] ? kthread_freezable_should_stop+0x70/0x70
[  381.124258]  [<ffffffff81626140>] ? gs_change+0x13/0x13
[  381.124258] Code: 38 03 00 00 48 8d 93 30 03 00 00 4c 89 ef e8 3b 01 20 e1 3e 41 ff 84 24 d4 00 00 00 4c 89 f7 e8 7a ff 52 e1 48 8b 83 70 03 00 00 <48> 8b b8 f8 00 00 00 e8 c7 42 fa e0 48 8b 5d e0 4c 8b 65 e8 4c
[  381.124258] RIP  [<ffffffffa00ed09d>] iscsit_add_cmd_to_response_queue+0x8d/0xc0 [iscsi_target_mod]
[  381.124258]  RSP <ffff880036855d70>
[  381.124258] CR2: 00000000000000f8
[  381.147997] ---[ end trace e0be12745e6881c7 ]---

So AFAICT this looks like a iblock bio completion callback is being
invoked more than once, that is being triggered when using multiple BIOs
to complete a single se_cmd descriptor.

Looking at the code in question from target_core_iblock.c, I think I see
what's going on now.  There is a double completion happening between the
iblock_complete_cmd call in iblock_execute_rw() and iblock_bio_done()
that appears to have been introduced with:

commit 5787cacd0bd5ee016ad807b244550d34fe2beebe
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date:   Tue Apr 24 00:25:06 2012 -0400

    target: remove struct se_task

I'm currently testing the following patch to address this bug.

Please give it a shot with your setup.

Thanks!

--nab

diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 57d7674..305aa41 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -658,7 +658,7 @@ static int iblock_execute_rw(struct se_cmd *cmd)
        bio_list_init(&list);
        bio_list_add(&list, bio);
 
-       atomic_set(&ibr->pending, 2);
+       atomic_set(&ibr->pending, 1);
        bio_cnt = 1;
 
        for_each_sg(sgl, sg, sgl_nents, i) {
@@ -689,7 +689,6 @@ static int iblock_execute_rw(struct se_cmd *cmd)
        }
 
        iblock_submit_bios(&list, rw);
-       iblock_complete_cmd(cmd);
        return 0;
 
 fail_put_bios:

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html