LIO iSER small random IO performance.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

When using LIO iSER over RoCE, we see variations in 8K read IOPS performance depending on the backend storage.

If using a ramdisk backend storage (loop device created atop a 20G tmpfs RAM filesystem, or ramdisk_mcp which yields more or less the same performance) we get 2.5x less IOPS than when running "fio" locally.

If using a "real" block backend (MD or LV interleaved (RAID 0 or interleaved volume) built ontop of six Crucial M50 1TB SSDs) we get 3.4x less IOPS than when running "fio" locally.

While we expected a small performance degradation between "local" IOs and iSER ones, we did not expect to see a gap of 2.5x or 3.5x less IOPS.

Is this expected? It's hard to find proper unbiased benchmarks that compare local IOPS vs iSER IOPS. We don't get that issue when running large nice sequential IOs, where our local bandwidth is equivalent to our remote one. We were wondering if there were anything obvious we might have overlooked in our configuration. Any idea would be greatly appreciated.

The system configuration is as follow:

Target node (Running LIO):

* "Homemade" buildroot based distribution, Linux 3.10.35 x86_64 (SMP), stock Infiniband drivers (*NO* OFED drivers). * Running on a Xeon E5-2695v2 (2.40Ghz, 12 physical cores, 24 logical cores). HT is enabled (we therefore have 24 logical cores showing up in "top"), with 64GiB of RAM and a ConnectX-3 Pro 40Gb converged card configured as RoCE.

Initiator node:

* CentOS 6.5, running a "stock" upstream 3.10.59 x86_64 (SMP) kernel with default config from "make menuconfig". Again using stock Infiniband drivers (*NO* OFED drivers). * Running on a Xeon E3-1241v3 (3.5Ghz, 4 physical cores, 8 logical cores). HT is enabled (8 cores show up in top), with 16GiB of RAM a ConnectX-3 Pro 40Gb converged card configured as RoCE.

Both cards are directly connected.

Here are the "fio" tests and their respective results.

NOTE: The same "fio" command is used on either the target (locally) or the initiator (over iSER).

fio --filename=/dev/<device> --direct=1 --rw=randrw --ioengine=libaio --bs=8k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=test1

/dev/loop0 (tmpfs ramdisk), local: 341k io/s
/dev/loop0 (tmpfs ramdisk), remote (iSER): 186k io/s

/dev/md_d1 (6*1TB Crucial M50 RAID0), local: 210k io/s
/dev/md_d1 (6*1TB Crucial M50 RAID0), remote (iSER): 71.2k io/s

CPU usage when running over "fio" over iSER is about 65% of one core running "kworker" and 15% of that core in "hardware interrupt" with about 15-20% idle.

So here we know we can reach high IOPS on the backend storage directly, but somehow we're unable to get close when running over iSER, whether the backend storage is real disks or a memdisk. Also, the bottleneck is clearly not the iSER link, at least for the test on the RAID since we get over twice as many IOPS when running on a ramdisk backstore. The issue here is the difference between local IOPS and iSER IOPS.

Thanks a lot in advance for your help!

Regards,
Ben - MPSTOR.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux