Re: lio taget iscsi multiple core performance

Xianghua Xiao <xiaoxianghua@xxxxxxxxx> · Wed, 2 Oct 2013 22:59:44 -0500

First I use lio-utils instead of targetcli, as this is an embedded box
that has very limited python packages built-in.

On Wed, Oct 2, 2013 at 5:26 PM, Nicholas A. Bellinger
<nab@xxxxxxxxxxxxxxx> wrote:
> On Wed, 2013-10-02 at 14:07 -0500, Xianghua Xiao wrote:
>> after I changed default_cmdsn_depth to 64 I use iomter to do READ,
>> only core0 is busy, for WRITE, all cores(12 of them) are equally busy.
>>
>
> Have you been able to isolate the issue down to per session
> performance..?  What happens when the same MD RAID backend is accessed
> across multiple sessions via a different TargetName+TargetPortalGroupTag
> endpoint..?  Does the performance stay the same..?
>
> Also, it would be useful to confirm with a rd_mcp backend to determine
> if it's something related to the fabric (eg: iscsi) or something related
> to the backend itself.
>
I have 12 RAID5 built from 4 SSDs(each SSD has 8 partitions). Only the
first two of key steps are shown here:
tcm_node --block iblock_0/my_iblock0 /dev/md0
tcm_node --block iblock_1/my_iblock1 /dev/md1
...
lio_node --addlun iscsi-test0 1 0 lun_my_block iblock_0/my_iblock0
lio_node --addlun iscsi-test1 1 0 lun_my_block iblock_1/my_iblock1
...
lio_node --addnp iscsi-test0 1 172.16.0.1:3260
lio_node --addnp iscsi-test1 1 172.16.0.1:3260
...
lio_node --enabletpg iscsi-test0 1
lio_node --enabletpg iscsi-test1 1
...

After this, on the Windows machine I get 12 new disk drives, and
format them as NTFS.

>> I created 12 target(each has one LUN) for 12-cores in this case, still
>> the performance for both READ and WRITE are about 1/3 comparing to
>> SCST I got in the past.
>>
>
> Can you send along your rtsadmin/targetcli configuration output in order
> to get an idea of the setup..?  Also, any other information about the
> backend configuration + hardware would be useful as well.
>
> Also, can you give some specifics on the workload in question..?
>
the workload is generated by iometer, I created a 64KB 100% Sequential
Write and 128KB 100% Sequential READ workloads to all the 12 iSCSI
disks per worker. After that I duplicate the workers, to 4, 8, 12, for
example. No matter what I try, the performance is roughly 1/3
comparing to SCST with similar settings(12 RAID5 iSCSI + iometer)

For example with SCST, I can easily get wire speed(10Gbps) for READ,
with LIO I can at most get 3.8Gbps.

For READ, core0 is 0% idle during test, the rest 11 cores are about
80% idle each.
For WRITE, all 12 cores are 10% idle.

Again comparing to SCST, all cores are always nearly evenly
distributed for computing at both READ and WRITE via iometer.

>> is LIO-iSCSI on 3.8.x 'best' for 10/100/1G network only? other than
>> the DEFAULT_CMDSN_DEPTH definition what else I could tune for 10G/40G
>> iSCSI? Again I am using the same scheduler/ fifo_batch
>> strip_cache_size read_ahead_kb etc parameters  as I used with SCST,
>> the only  major difference is LIO vs SCST itself.
>
> If your on IB/RoCE/iWARP verbs capable hardware, I'd very much recommend
> checking out the iser-target that is included in >= v3.10 kernels.

I have to use 3.8.x for now, and am testing iSCSI/Lio at the moment,
before moving to FCoE soon.
Thanks!
>
> --nab
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html