Re: Performance issue with O_DIRECT

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Thu, 17 Sep 2015 23:48:30 -0700

On Fri, 2015-09-18 at 04:31 +0200, Dragan Milivojević wrote:
> I have no experience with MC/S (multipathd has served me well) so
> maybe this is PEBKAC?
> Commands that I used:

<SNIP>

> [root@localhost ~]# lsscsi | grep "LIO"
> [10:0:0:0]   disk    LIO-ORG  rd0              4.0   /dev/sdc
> [11:0:0:0]   disk    LIO-ORG  rd0              4.0   /dev/sdd
> 
> iscsiadm -m session -P 3 output is attached in iscsi_session.txt
> 

To clarify, MC/S means a single session (eg: each $HOST:X:X:X) has more
than a single TCP connection.

The above with open/iscsi is SC/S (single conn per session) mode.

> 
> > So looking at the performance results, I'm a bit confused..
> 
> Sorry I should have commented those results a bit better.
> 
> > [root@localhost ~]# lsscsi | grep "LIO"
> > [11:0:0:0]   disk    LIO-ORG  pv0              4.0   /dev/sdc
> > [11:0:0:1]   disk    LIO-ORG  lv0              4.0   /dev/sdf
> > [11:0:0:2]   disk    LIO-ORG  rd0              4.0   /dev/sde
> > [11:0:0:3]   disk    LIO-ORG  rd1              4.0   /dev/sdd
> >

<SNIP>

> > If RAMDISK-MCP + O_DIRECT is not able to reach ~110 MB/sec up to the
> > limit of a unidirectional 1 Gb/sec port, it would indicate there is
> > something else going on outside of iscsi application layer code.
> 
> I eliminated everything else that I thought of, we can walk try it all again.
> 
> > So I'd recommend first verifying the network using iperf that normal
> > socket I/O is able to reach 1 Gb/sec in both directions.
> 
> This was done on initial server setup but I ran the tests again.
> Output from iperf and iperf3 is attached in iperf_tests.txt
> 
> Network cards used: HP NC364T, Intel 82571EB, on client and server.
> While the iperf was running I monitored the network throughput with dstat.
> Bandwith was about 118MB/s even with bidirectional test so I guess the
> network can be ruled out?

Correct, this looks as expected for a single 1 Gb/sec port.

> 
> >
> > From there, I'd recommend using fio with different iodepth settings +
> > direct=1 in order to determine at which point you're able to saturate
> > the 1 Gb/sec link with blocksize=64k.
> >
> > I'd recommend setting 'echo noop > /sys/block/$DEV/queue/scheduler'
> > on the Linux iscsi initiator LUNs as well.
> >
> > Also keep in mind that for sequential I/O, the initiator side will be
> > doing merging of 64k to larger requests.  You can run 'iostat -xm 2'
> > on both sides to see what I/O sizes are actually being generated.
> 
> Iostat on client and server report an avgrq-sz of 128 consistently so I'm not
> seeing that.  Kernel version on client is 4.1.5 if that makes any difference.
> 
> I have done the suggested tests, scheduler was set to noop.
> I have used ramdisk and a partition on a hard drive as backstores.
> 
> Results from the test are summarized below, full test output is
> attached in fio_scsi_tests.txt
> 
> iodepth 1, ramdisk, 64k block size, bandwith 62MB/s, client iostat avgrq-sz 128
> iodepth 2, ramdisk, 64k block size, bandwith 112MB/s, client iostat avgrq-sz 128
> iodepth 3, ramdisk, 64k block size, bandwith 120MB/s, client iostat avgrq-sz 128
> 
> I have also run tests with iodepth set at 1 and changed the block size
> 
> iodepth 1, ramdisk, 128k block size, bandwith 63MB/s, client iostat avgrq-sz 256
> iodepth 1, ramdisk, 256k block size, bandwith 80MB/s, client iostat avgrq-sz 512
> iodepth 1, ramdisk, 512k block size, bandwith 88MB/s, client iostat avgrq-sz 1024
> iodepth 1, ramdisk, 1024k block size, bandwith 91MB/s, client iostat avgrq-sz 2048
> iodepth 1, ramdisk, 4096k block size, bandwith 93MB/s, client iostat avgrq-sz 8192
> iodepth 1, ramdisk, 8192k block size, bandwith 114MB/s, client iostat avgrq-sz 8192
> iodepth 1, ramdisk, 16384k block size, bandwith 116MB/s, client iostat avgrq-sz 8192
> 

So in order to reach 1 Gb/sec port saturation, you'll need to push
iodepth > 1 @64k blocksize, or utilize a larger blocksize at iodepth=1.

This is about what I'd expected for 1 Gb/sec @ 1500 MTU btw.

> Hard drive as backstore:
> 
> iodepth 1, block, 64k block size, bandwith 50MB/s, client iostat avgrq-sz 128, server iostat avgrq-sz 128
> iodepth 2, block, 64k block size, bandwith 94MB/s, client iostat avgrq-sz 128, server iostat avgrq-sz 128
> iodepth 3, block, 64k block size, bandwith 113MB/s, client iostat avgrq-sz 128, server iostat avgrq-sz 128
> iodepth 4, block, 64k block size, bandwith 118MB/s, client iostat avgrq-sz 128, server iostat avgrq-sz 128
> 

This looks about as I'd expected as well.

> Changing the block size has some effect but unfortunately bandwidth
> maxes at 70MB/s even when block size was set at 32768K.   At this
> setting avgrq-sz on server was 1024 and on the client 8192
> 

Using a > 1500 byte MTU may help get you closer to 1 Gb/sec saturation
at iodepth=1.

Otherwise fio w/ iodepth>1 looks normal for 1 Gb/sec.

> 
> > Sep 17 02:08:08 storage kernel: Got unknown iSCSI OpCode: 0x43
> > Sep 17 02:08:08 storage kernel: Cannot recover from unknown opcode while ERL=0, closing iSCSI connection.
> >
> >
> > Taking a look at this now.
> 
> Thanks

I'm able to reproduce with v4.3-rc1 code.

Note this bug is not specific to MC/S operation, and appears to be a
regression specific to MSFT iSCSI initiators.

Still debugging this, but should have a bug-fix soon.

Thanks alot for reporting this.

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html