Re: LIO iSER small random IO performance.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/6/2014 6:30 PM, Benjamin ESTRABAUD wrote:
Hi All,

When using LIO iSER over RoCE, we see variations in 8K read IOPS
performance depending on the backend storage.

If using a ramdisk backend storage (loop device created atop a 20G tmpfs
RAM filesystem, or ramdisk_mcp which yields more or less the same
performance) we get 2.5x less IOPS than when running "fio" locally.


That doesn't sound right...

If using a "real" block backend (MD or LV interleaved (RAID 0 or
interleaved volume) built ontop of six Crucial M50 1TB SSDs) we get 3.4x
less IOPS than when running "fio" locally.

While we expected a small performance degradation between "local" IOs
and iSER ones, we did not expect to see a gap of 2.5x or 3.5x less IOPS.

That's also strange...


Is this expected? It's hard to find proper unbiased benchmarks that
compare local IOPS vs iSER IOPS. We don't get that issue when running
large nice sequential IOs, where our local bandwidth is equivalent to
our remote one. We were wondering if there were anything obvious we
might have overlooked in our configuration. Any idea would be greatly
appreciated.


I would like to know your initiator block layer settings such as:
- scheduler
- nomerges
- rq_affinity
- add_random

Also, I would like to understand your IRQ affinity placement on both
stations.

Is it a single device? single session?

The system configuration is as follow:

Target node (Running LIO):

* "Homemade" buildroot based distribution, Linux 3.10.35 x86_64 (SMP),
stock Infiniband drivers (*NO* OFED drivers).
* Running on a Xeon E5-2695v2 (2.40Ghz, 12 physical cores, 24 logical
cores). HT is enabled (we therefore have 24 logical cores showing up in
"top"), with 64GiB of RAM and a ConnectX-3 Pro 40Gb converged card
configured as RoCE.

Initiator node:

* CentOS 6.5, running a "stock" upstream 3.10.59 x86_64 (SMP) kernel
with default config from "make menuconfig". Again using stock Infiniband
drivers (*NO* OFED drivers).
* Running on a Xeon E3-1241v3 (3.5Ghz, 4 physical cores, 8 logical
cores). HT is enabled (8 cores show up in top), with 16GiB of RAM a
ConnectX-3 Pro 40Gb converged card configured as RoCE.

Both cards are directly connected.

Here are the "fio" tests and their respective results.

NOTE: The same "fio" command is used on either the target (locally) or
the initiator (over iSER).

fio --filename=/dev/<device> --direct=1 --rw=randrw --ioengine=libaio
--bs=8k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60
--group_reporting --name=test1

/dev/loop0 (tmpfs ramdisk), local: 341k io/s
/dev/loop0 (tmpfs ramdisk), remote (iSER): 186k io/s

I get on ramdisk(_mcp) over iser the following results:
numjobs=16,iodepth=16: 254K IOPs
numjobs=16,iodepth=128: 297K IOPs

And I don't have a significant different system than yours:
Systems connected b2b - CX3 (VPI) single 40GE link
Both systems: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (8 cores but only 1 is active on the target and 4 are active at the initiator)

Target OS: RHEL7.0
Initiator OS: RH6.4 (iser-1.5 package - same as upstream)

Rest of fio settings:
direct=1
rw=randread
bs=8k
runtime=60
group_reporting
name=test1
ioengine=libaio
time_based
loops=1
fsync_on_close=1
randrepeat=1
norandommap
exitall


/dev/md_d1 (6*1TB Crucial M50 RAID0), local: 210k io/s
/dev/md_d1 (6*1TB Crucial M50 RAID0), remote (iSER): 71.2k io/s

CPU usage when running over "fio" over iSER is about 65% of one core
running "kworker" and 15% of that core in "hardware interrupt" with
about 15-20% idle.


Haven't tried that - but I don't think you should see this gap...

So here we know we can reach high IOPS on the backend storage directly,
but somehow we're unable to get close when running over iSER, whether
the backend storage is real disks or a memdisk. Also, the bottleneck is
clearly not the iSER link,

Definitely not... the Link can carry way more than that...

at least for the test on the RAID since we
get over twice as many IOPS when running on a ramdisk backstore. The
issue here is the difference between local IOPS and iSER IOPS.


I strongly recommend that you checkout Mellanox community's iSER/LIO/RDMA/PERF related posts at: http://community.mellanox.com/content?filterID=all~objecttype~objecttype%5Bdocument%5D&query=Iser

Cheers,
Sagi.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux