Re: Performance issue with O_DIRECT

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Thu, 17 Sep 2015 15:11:39 -0700

Hi Dragan,

On Thu, 2015-09-17 at 19:04 +0200, Dragan Milivojević wrote:
> Hi all
> 
> While building a NAS/SAN server for a local video production company I stumbled
> on a couple of issues and errors. It seems that MaxConnections is not
> enforced and MC/S is broken in every target version that I tried when using a Win7 initiator.

Comments inline below.

> With a linux initiator MC/S works as expected.
> 

What Linux initiator are you using that supports MC/S btw..?

> The show stopper issue in my current use case is bad performance when using 64k
> direct I/O.
> 
> At first I assumed that Windows iSCSI initiator is broken (like most
> things in windows).
> In an effort to verify this I connected to the server using a linux
> (fedora 20) initiator and the issue
> persists. I have tested this using  block and ramdisk backstores.
> 
> For example: direct i/o to an exported had disk partition is maxed at
> around 48MB/s.
> This was also verified by monitoring the network traffic.
> With ramdisk or nullio backstores transfer maxes at around 60MB/s.
> 
> When using buffered i/o transfer speed is limited by 1G ethernet and
> maxes at around 112MB/s.
> 
> Transfer speed of the hard drive (/dev/sdc6) on the server when using
> 64k direct i/o is around 150MB/s.
> 
> The hard drive and the exported partition weren't utilized on the
> server for anything else (it's a spare disk for soft raid) and the server was idle.
> 
> Centos 7 was used, default kernels (RHEL 3.10 series) and also mainline 4.2.
> Target engine versions 4.1 and 5.0
> 
> I have attached the target configuration and test outputs and also
> errors produced by target when MC/S is used with Windows initiator.
> 
> For easy viewing I also uploaded the attached file to pastebin:
> http://pastebin.com/N2xVC8G5
> 
> Any ideas on what is causing this issue with direct i/o?
> 
> Thanks
> Dragan

So looking at the performance results, I'm a bit confused..

[root@localhost ~]# lsscsi | grep "LIO"
[11:0:0:0]   disk    LIO-ORG  pv0              4.0   /dev/sdc 
[11:0:0:1]   disk    LIO-ORG  lv0              4.0   /dev/sdf 
[11:0:0:2]   disk    LIO-ORG  rd0              4.0   /dev/sde 
[11:0:0:3]   disk    LIO-ORG  rd1              4.0   /dev/sdd 

----------------------------- <IBLOCK + pv0>

[root@localhost ~]# dd_rescue -ADv -m 16G -b 64k /dev/zero /dev/sdc
dd_rescue: (info): ipos:  16773120.0k, opos:  16773120.0k, xferd:  16773120.0k
                   errs:      0, errxfer:         0.0k, succxfer:  16773120.0k
             +curr.rate:    48464kB/s, avg.rate:    48443kB/s, avg.load:  3.3%
             >----------------------------------------.<  99%  ETA:  0:00:00 
dd_rescue: (info): Summary for /dev/zero -> /dev/sdc
dd_rescue: (info): ipos:  16777216.0k, opos:  16777216.0k, xferd:  16777216.0k
                   errs:      0, errxfer:         0.0k, succxfer:  16777216.0k
             +curr.rate:    45179kB/s, avg.rate:    48439kB/s, avg.load:  3.3%
             >-----------------------------------------< 100%  TOT:  0:05:46 

----------------------------- <RAMDISK-MCP>

[root@localhost ~]# dd_rescue -ADv -m 4G -b 64k /dev/zero /dev/sdd
dd_rescue: (info): ipos:   4191232.0k, opos:   4191232.0k, xferd:   4191232.0k
                   errs:      0, errxfer:         0.0k, succxfer:   4191232.0k
             +curr.rate:    62343kB/s, avg.rate:    62376kB/s, avg.load:  4.6%
             >----------------------------------------.<  99%  ETA:  0:00:00 
dd_rescue: (info): Summary for /dev/zero -> /dev/sdd
dd_rescue: (info): ipos:   4194304.0k, opos:   4194304.0k, xferd:   4194304.0k
                   errs:      0, errxfer:         0.0k, succxfer:   4194304.0k
             +curr.rate:    61934kB/s, avg.rate:    62375kB/s, avg.load:  4.6%
             >-----------------------------------------< 100%  TOT:  0:01:07 

----------------------------- <RAMDISK-MCP + NULLIO>

[root@localhost ~]# dd_rescue -ADv -m 4G -b 64k /dev/zero /dev/sde
dd_rescue: (info): ipos:   4188160.0k, opos:   4188160.0k, xferd:   4188160.0k
                   errs:      0, errxfer:         0.0k, succxfer:   4188160.0k
             +curr.rate:    62515kB/s, avg.rate:    61642kB/s, avg.load:  4.8%
             >----------------------------------------.<  99%  ETA:  0:00:00 
dd_rescue: (info): Summary for /dev/zero -> /dev/sde
dd_rescue: (info): ipos:   4194304.0k, opos:   4194304.0k, xferd:   4194304.0k
                   errs:      0, errxfer:         0.0k, succxfer:   4194304.0k
             +curr.rate:    62403kB/s, avg.rate:    61643kB/s, avg.load:  4.8%
             >-----------------------------------------< 100%  TOT:  0:01:08 

Buffered I/O
----------------------------- <Without O_DIRECT>
[root@localhost ~]# dd_rescue -Av -m 16G -b 64k /dev/zero /dev/sdc
dd_rescue: (info): ipos:  16776192.0k, opos:  16776192.0k, xferd:  16776192.0k
                   errs:      0, errxfer:         0.0k, succxfer:  16776192.0k
             +curr.rate:   115223kB/s, avg.rate:   136634kB/s, avg.load: 14.7%
             >----------------------------------------.<  99%  ETA:  0:00:00 
dd_rescue: (info): Summary for /dev/zero -> /dev/sdc
dd_rescue: (info): ipos:  16777216.0k, opos:  16777216.0k, xferd:  16777216.0k
                   errs:      0, errxfer:         0.0k, succxfer:  16777216.0k
             +curr.rate:    86427kB/s, avg.rate:   112677kB/s, avg.load: 12.5%
             >-----------------------------------------< 100%  TOT:  0:02:29 

If RAMDISK-MCP + O_DIRECT is not able to reach ~110 MB/sec up to the
limit of a unidirectional 1 Gb/sec port, it would indicate there is
something else going on outside of iscsi application layer code.

So I'd recommend first verifying the network using iperf that normal
socket I/O is able to reach 1 Gb/sec in both directions.

>From there, I'd recommend using fio with different iodepth settings +
direct=1 in order to determine at which point you're able to saturate
the 1 Gb/sec link with blocksize=64k.

I'd recommend setting 'echo noop > /sys/block/$DEV/queue/scheduler'
on the Linux iscsi initiator LUNs as well.

Also keep in mind that for sequential I/O, the initiator side will be
doing merging of 64k to larger requests.  You can run 'iostat -xm 2'
on both sides to see what I/O sizes are actually being generated.

For the other items:

4.2.0-1.el7.elrepo.x86_64 MC/S Win 7 Initiator
======================================================================================================================================           

Sep 17 02:08:08 storage kernel: Got unknown iSCSI OpCode: 0x43
Sep 17 02:08:08 storage kernel: Cannot recover from unknown opcode while ERL=0, closing iSCSI connection.

Taking a look at this now.

3.10.0-229.14.1.el7.x86_64 MC/S Win 7 Initiator
======================================================================================================================================

Sep 17 02:25:04 storage kernel: ------------[ cut here ]------------
Sep 17 02:25:04 storage kernel: WARNING: at mm/page_alloc.c:2477 __alloc_pages_nodemask+0x46a/0xb90()
Sep 17 02:25:04 storage kernel: Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod bonding vfat fat ext4 mbcache jbd2 kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw eeepc_wmi asus_wmi gf128mul glue_helper sparse_keymap rfkill ablk_helper cryptd sp5100_tco tpm_infineon fam15h_power pcspkr k10temp serio_raw edac_mce_amd edac_core i2c_piix4 shpchp acpi_cpufreq xfs libcrc32c raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq sd_mod crc_t10dif crct10dif_common nouveau video i2c_algo_bit drm_kms_helper ttm mxm_wmi r8169 mii drm e1000e ahci libahci i2c_core libata mpt2sas ptp raid_class pps_core scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod
Sep 17 02:25:04 storage kernel: CPU: 5 PID: 2039 Comm: iscsi_np Not tainted 3.10.0-229.14.1.el7.x86_64 #1
Sep 17 02:25:04 storage kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A99FX PRO R2.0, BIOS 2501 04/07/2014
Sep 17 02:25:04 storage kernel: 0000000000000000 0000000046689ffd ffff88023092faa8 ffffffff81604516
Sep 17 02:25:04 storage kernel: ffff88023092fae0 ffffffff8106e29b 000000000012c0d0 0000000000000000
Sep 17 02:25:04 storage kernel: 000000000010c0d0 0000000000000000 0000000000000000 ffff88023092faf0
Sep 17 02:25:04 storage kernel: Call Trace:

<SNIP>

Sep 17 02:25:04 storage kernel: Unable to allocate memory for tmpbuf.
Sep 17 02:25:04 storage kernel: iSCSI Login negotiation failed.

So this one is was fixed for mainline in 05-2014, and picked up by linux-3.10.y stable here:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/target/iscsi?h=linux-3.10.y&id=d5c55fa31a2947ac517abddd7ca6975a17bef178

but it looks like it was not picked up for RHEL 7.1 / 3.10.0-229.14.1.el7.x86_64 code.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html