Hi All,
When using LIO iSER over RoCE, we see variations in 8K read IOPS
performance depending on the backend storage.
If using a ramdisk backend storage (loop device created atop a 20G tmpfs
RAM filesystem, or ramdisk_mcp which yields more or less the same
performance) we get 2.5x less IOPS than when running "fio" locally.
If using a "real" block backend (MD or LV interleaved (RAID 0 or
interleaved volume) built ontop of six Crucial M50 1TB SSDs) we get 3.4x
less IOPS than when running "fio" locally.
While we expected a small performance degradation between "local" IOs
and iSER ones, we did not expect to see a gap of 2.5x or 3.5x less IOPS.
Is this expected? It's hard to find proper unbiased benchmarks that
compare local IOPS vs iSER IOPS. We don't get that issue when running
large nice sequential IOs, where our local bandwidth is equivalent to
our remote one. We were wondering if there were anything obvious we
might have overlooked in our configuration. Any idea would be greatly
appreciated.
The system configuration is as follow:
Target node (Running LIO):
* "Homemade" buildroot based distribution, Linux 3.10.35 x86_64 (SMP),
stock Infiniband drivers (*NO* OFED drivers).
* Running on a Xeon E5-2695v2 (2.40Ghz, 12 physical cores, 24 logical
cores). HT is enabled (we therefore have 24 logical cores showing up in
"top"), with 64GiB of RAM and a ConnectX-3 Pro 40Gb converged card
configured as RoCE.
Initiator node:
* CentOS 6.5, running a "stock" upstream 3.10.59 x86_64 (SMP) kernel
with default config from "make menuconfig". Again using stock Infiniband
drivers (*NO* OFED drivers).
* Running on a Xeon E3-1241v3 (3.5Ghz, 4 physical cores, 8 logical
cores). HT is enabled (8 cores show up in top), with 16GiB of RAM a
ConnectX-3 Pro 40Gb converged card configured as RoCE.
Both cards are directly connected.
Here are the "fio" tests and their respective results.
NOTE: The same "fio" command is used on either the target (locally) or
the initiator (over iSER).
fio --filename=/dev/<device> --direct=1 --rw=randrw --ioengine=libaio
--bs=8k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60
--group_reporting --name=test1
/dev/loop0 (tmpfs ramdisk), local: 341k io/s
/dev/loop0 (tmpfs ramdisk), remote (iSER): 186k io/s
/dev/md_d1 (6*1TB Crucial M50 RAID0), local: 210k io/s
/dev/md_d1 (6*1TB Crucial M50 RAID0), remote (iSER): 71.2k io/s
CPU usage when running over "fio" over iSER is about 65% of one core
running "kworker" and 15% of that core in "hardware interrupt" with
about 15-20% idle.
So here we know we can reach high IOPS on the backend storage directly,
but somehow we're unable to get close when running over iSER, whether
the backend storage is real disks or a memdisk. Also, the bottleneck is
clearly not the iSER link, at least for the test on the RAID since we
get over twice as many IOPS when running on a ramdisk backstore. The
issue here is the difference between local IOPS and iSER IOPS.
Thanks a lot in advance for your help!
Regards,
Ben - MPSTOR.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html