Re: Too big sectors - exceeding fabric_max_sectors

"Prantis, Kelsey" <kelsey.prantis@xxxxxxxxx> · Wed, 7 Nov 2012 17:11:29 +0000

Hi Nicholas,

I decided to try to use the patch you provided since I have quite a few
devices and initiators, and it seemed like it'd be a pain to maintain
changing that setting on every initiator. I added that patch to the patch
you provided earlier this week for too long scsi ids on top of a
3.6.3-1.fc17 kernel, and now our target server keeps oopsing after a few
minutes of i/o. I've included the oops output below.

[  839.081918] BUG: unable to handle kernel NULL pointer dereference at
0000000000000028
[  839.082411] IP: [<ffffffffa00bf721] iblock_bio_destructor+0x11/0x30
[target_core_iblock]
[  839.082411] PGD 0
[  839.082411] Oops: 0000 [#1] SMP
[  39.082411] Modules linked in: target_core_pscsit_core_file
target_core_iblo iscsi_target_mod target_core_mod
ip6t_REJECTf_conntrack_ipv6 nf_defrag_ipv6 xt_statef_conntrack
ip6table_filter ip6_tablestio_balloon virtio_net i2c_piix4 i2c_cor
microcode joydev virtio_blk
[  839.082411] CPU 0 9.082411] Pid: 0, comm: swapper/0 Not taintd
3.6.3-1.chromatestlab1.fc17.x86_64 #1 Red Hat KVM
[  839.082411] RIP: 0010:[<ffffffffa00bf721>]  [<ffffffffa001>]
iblock_bio_destructor+0x11/0x30 [target_core_iblock]
[  839.082411] RSP: 0018:ffff88007fc03ce8  EFLAGS: 00010092[  839.082411]
RAX: 0000000000000000 RBX: ffff880071da0 RCX: 0000000000000400
[  839.082411] RDX: 0000000000080000 RSI: 0000000000000000 RDI:
fff880071da6840
[  839.082411] RBP: ffff88007fc03ce8 R0: 0000000000000000 R09:
0000000000000000
[  839.082411] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff88006945f1a0
[  839.082411] R13: 0000000000000000 R14: 0000000000080000 R15:
0000000000080000
[  839.082411] FS:  0000000000000000(0000) GSf88007fc00000(0000)
knlGS:0000000000000000
[  839.082411] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008  839.082411]
CR2: 0000000000000028 CR3: 0000000036ca00 CR4: 00000000000006f0
[  839.082411] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  839.082411] DR3: 0000000000000000 DR6: 00000000ffff0ff0 R7:
0000000000000400
[  839.082411] Process swaper/0 (pid: 0, threadinfo ffffffff81c00000, task
ffffffff8c13420)
[  839.082411] Stack:
[  839.082411]  ffff88007fc03d08 ffffffffffff815442f4 ffff880071da6840
[  839.082411] ffff88007fc03d38 ffffffffa00bf640 ffff880079aa00
ffff880071da6840
[  839.082411]  0000000000000 ffff880079752d78 ffff88007fc03d48
ffffffff811c24cd[  839.082411] Call Trace:
[  839.082411]  <IRQ>
[  839.082411]  [<ffffffff811c39a6>] bio_put+0x36/0x40
[  839.082411]  [<ffffffff815442f4>] ? ip_rcv+0x214/0x320
[  839.082411]  [<ffffffffa00bf640>] iblock_bio_d0 [target_core_iblock]
[  839.082411]  [<ffffffffc24cd>] bio_endio+0x1d/0x40
[  839.082411]  [<ffffff812b7513>] req_bio_endio.isra.53+0xa3/0xe0
[  839.082411]  [<ffffff812b7648>] blk_update_request+0xf8/0x480
[  839.082411]  [<fffffff81371375>] ? detach_buf+0x95/0xb0
[  839.082411]  [<ffffffff812b79f7>] blk_update_bidi_request+0x2xa0
[  839.082411]  [<ffffffff812bad80>] __blk_end_bii_request+0x20/0x50
[  839.082411]  [<ffffffff812ba] __blk_end_request_all+0x1f/0x30
[  839.082411]  fffffa00000ba>] blk_done+0x5a/0x100 [virtio_blk]
[  839.082411]  [<ffffffff81371aac>] vring_interrupt+0c/0xa0
[  839.082411]  [<ffffffff810e7774>] handle_irq_event_percpu+0x54/0x1f
[  839.082411]  [<ffffffff810e7951>] handle_irq_event+0x41/0x70
[  839.082411]  [<ffffffff810ea16f>] ndle_edge_irq+0x6f/0x110
[  839.082411]  [<fffffff101510f>] handle_irq+0xbf/0x150
[  839.082411]  [<ffffffff81620eb2>] ?
__atomic_noifier_call_chain+0x12/0x20
[  839.082411]  [<ffffffff8160ed6>] ? atomic_notifier_call_chain+0x16/0x20
[  839.082411]  [<ffffffff81626a8a>] do_IRQ+x5a/0xe0
[  839.082411]  [<ffffffff8161d36a>] common_interr+0x6a/0x6a
[  839.082411]  <EOI>
[  839.082411]  [<fffffff81041fd6>] ? native_safe_halt+0x6/0x10
[  839.082411]  [<ffffff8101b85f>] default_idle+0x4f/0x1a0
[  839.082411]  [<fffffff8101c58e>] cpu_idle+0xfe/0x120
[  839.082411]  [<ffffffff8a8e>] rest_init+0x72/0x74
[  839.082411]  [<ffffffff81cfbc2c>] strt_kernel+0x3b9/0x3c6
[  839.082411]  [<ffffffff81cfb672>] epair_env_string+0x5e/0x5e
[  839.082411]  [<ffffffff81cfb35] x86_64_start_reservations+0x131/0x135
[  839.082411]  [<ffffffff81cfb45a>] x86_64_strt_kernel+0x100/0x10f
[  839.082411] Code: f8  c3 48 c7 c7 68 12 0c a0 31 c0 fc 37 55 e1 eb db
0f 1f 80 00 00 55 48 89 e5 0f 1f 44 00 00 48 8b 7 58 48 8b 40 70 <48> 8b
40 28 0 08 02 00 00 e8 af 3d 10 e1 5d c3 66 66 66
[  839.082411] RIP  [<ffffffffa00bf721>] iblock_bio_destructor+0x11/0x30
[target_core_iblock
[  839.082411]  RSP <ffff88007fc03ce8>
[  839.082411] CR2: 0000000000000028
[  839.082411] ---[ end trace 0005c37b3cb2ef5 ]---
[  839.082411] Kernel panic - notsyncing: Fatal exception in interrupt

Are there perhaps other patches between 3.6.3 and current that we need for
this to work?

Kelsey

On 11/6/12 6:41 PM, "Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> wrote:

>On Tue, 2012-11-06 at 21:06 +0000, Prantis, Kelsey wrote:
>> Hello,
>
>Hi Kelsey,
>
>> I was hoping perhaps you guys might be able to help me figure out a new
>>problem I am having with doing I/o on my targets:
>> 
>> On my initiators, I am receiving these error messages in
>>/var/log/messages
>> 
>> Nov  6 20:15:38 hydra-2-ss-storage-appliance-1 kernel: sd 8:0:0:0:
>>[sde] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Nov  6 20:15:38 hydra-2-ss-storage-appliance-1 kernel: sd 8:0:0:0:
>>[sde] Sense Key : Illegal Request [current]
>> Nov  6 20:15:38 hydra-2-ss-storage-appliance-1 kernel: sd 8:0:0:0:
>>[sde] Add. Sense: Invalid field in cdb
>> Nov  6 20:15:38 hydra-2-ss-storage-appliance-1 kernel: sd 8:0:0:0:
>>[sde] CDB: Write(10): 2a 00 00 02 dd 18 00 40 00 00
>> 
>> And over on my target server I am getting debug messages like these in
>>my dmesg:
>> 
>> [80076.570102] SCSI OP 2ah with too big sectors 10408 exceeds
>>fabric_max_sectors: 8192
>> [80076.598875] SCSI OP 2ah with too big sectors 16384 exceeds
>>fabric_max_sectors: 8192
>> [80079.416060] SCSI OP 2ah with too big sectors 16384 exceeds
>>fabric_max_sectors: 8192
>> [80079.425274] SCSI OP 2ah with too big sectors 16384 exceeds
>>fabric_max_sectors: 8192
>> [80079.432819] SCSI OP 2ah with too big sectors 16384 exceeds
>>fabric_max_sectors: 8192
>> [80079.439683] SCSI OP 2ah with too big sectors 8200 exceeds
>>fabric_max_sectors: 8192
>> 
>> This is the same setup I've described before, a 3.6.3-1.fc17 target
>>server, and el6 initiators.
>> 
>
>So at this point it's safe to bump these default+max values of
>fabric_max_sectors for IBLOCK + RAMDISK_MCP backends.  However, FILEIO
>will start to have problems here with > fabric_max_sectors=8192 payload
>size when the largish contiguous memory allocations for struct iovec
>with vfs_[read,write]v() ops start to fail.
>
>Here a quick target patch that should get you running with IBLOCK and
>RAMDISK_MCP backends:
>
>diff --git a/include/target/target_core_base.h
>b/include/target/target_core_base.h
>index 5350f6e..26d9db9 100644
>--- a/include/target/target_core_base.h
>+++ b/include/target/target_core_base.h
>@@ -72,7 +72,7 @@
> /* Default unmap_granularity_alignment */
> #define DA_UNMAP_GRANULARITY_ALIGNMENT_DEFAULT 0
> /* Default max transfer length */
>-#define DA_FABRIC_MAX_SECTORS                  8192
>+#define DA_FABRIC_MAX_SECTORS                  16384
> /* Emulation for Direct Page Out */
> #define DA_EMULATE_DPO                         0
> /* Emulation for Forced Unit Access WRITEs */
>@@ -97,7 +97,7 @@
> /* Enforce SCSI Initiator Port TransportID with 'ISID' for PR */
> #define DA_ENFORCE_PR_ISIDS                    1
> #define DA_STATUS_MAX_SECTORS_MIN              16
>-#define DA_STATUS_MAX_SECTORS_MAX              8192
>+#define DA_STATUS_MAX_SECTORS_MAX              32768
> /* By default don't report non-rotating (solid state) medium */
> #define DA_IS_NONROT                           0
> /* Queue Algorithm Modifier default for restricted reordering in control
>mode page */
>
>--
>
>but will likely end up being a special case for FILEIO that we have to
>issue multiple I/Os in the backend to handle the large request.  That
>said, I'm going to hold off merging these new defaults for the moment
>until that part can be sorted out with FILEIO.  (hch CC'ed)
>
>Also, another option in the short term to get something work would be to
>reduce the max_sectors_kb on the iSCSI initiator LLD side to 8192 (or
>smaller) to prevent the large CDB payloads from being generated on the
>client side.  (mnc + hare CC'ed)
>
>Thanks for reporting!
>
>--nab
>

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html