Re: iSCSI write performance

Mike Christie <mchristi@xxxxxxxxxx> · Fri, 25 Oct 2019 10:54:01 -0500

On 10/24/2019 11:47 PM, Ryan wrote:
> I'm using CentOS 7.7.1908 with kernel 3.10.0-1062.1.2.el7.x86_64. The
> workload was a VMware Storage Motion from a local SSD backed datastore

Ignore my comments. I thought you were just doing fio like tests in the vm.

> to the ceph backed datastore. Performance was measured using dstat on
> the iscsi gateway for network traffic and ceph status as this cluster is
> basically idle.  I changed max_data_area_mb to 256 and cmdsn_depth to
> 128. This appears to have given a slight improvement of maybe 10MB/s. 
> 
> Moving VM to the ceph backed datastore
> io:
>     client:   124 KiB/s rd, 76 MiB/s wr, 95 op/s rd, 1.26k op/s wr
> 
> Moving VM off the ceph backed datastore
>   io:
>     client:   344 MiB/s rd, 625 KiB/s wr, 5.54k op/s rd, 62 op/s wr
> 

If you run esxtop while running your test what do you see for the number
of commands in the iscsi LUN's queue?

> I'm going to test bonnie++ with an rbd volume mounted directly on the

To try and isolate if its the iscsi or rbd, you need to run fio with the
librbd io engine. We know krbd is going to be the fastest. ceph-iscsi
uses librbd so it is a better baseline. If you are not familiar with fio
you can just do something like:

fio --group_reporting --ioengine=rbd --direct=1 --name=librbdtest
--numjobs=32 --bs=512k --iodepth=128 --size=10G  --rw=write
--rbd=name_of_your_image -pool=name_of_pool

> iscsi gateway. Also will test bonnie++ inside a VM on a ceph backed
> datastore.
> 
> On Thu, Oct 24, 2019 at 7:15 PM Mike Christie <mchristi@xxxxxxxxxx
> <mailto:mchristi@xxxxxxxxxx>> wrote:
> 
>     On 10/24/2019 12:22 PM, Ryan wrote:
>     > I'm in the process of testing the iscsi target feature of ceph. The
>     > cluster is running ceph 14.2.4 and ceph-iscsi 3.3. It consists of 5
> 
>     What kernel are you using?
> 
>     > hosts with 12 SSD OSDs per host. Some basic testing moving VMs to
>     a ceph
>     > backed datastore is only showing 60MB/s transfers. However moving
>     these
>     > back off the datastore is fast at 200-300MB/s.
> 
>     What is the workload and what are you using to measure the throughput?
> 
>     If you are using fio, what arguments are you using? And, could you
>     change the ioengine to rbd and re-run the test from the target system so
>     we can check if rbd is slow or iscsi?
> 
>     For small IOs, 60 is about right.
> 
>     For 128-512K IOs you should be able to get around 300 MB/s for writes
>     and 600 for reads.
> 
>     1. Increase max_data_area_mb. This is a kernel buffer lio/tcmu uses to
>     pass data between the kernel and tcmu-runner. The default is only 8MB.
> 
>     In gwcli cd to your disk and do:
> 
>     # reconfigure max_data_area_mb %N
> 
>     where N is between 8 and 2048 MBs.
> 
>     2. The Linux kernel target only allows 64 commands per iscsi session by
>     default. We increase that to 128, but you can increase this to 512.
> 
>     In gwcli cd to the target dir and do
> 
>     reconfigure cmdsn_depth 512
> 
>     3. I think ceph-iscsi and lio work better with higher queue depths so if
>     you are using fio you want higher numjobs and/or iodepths.
> 
>     >
>     > What should I be looking at to track down the write performance issue?
>     > In comparison with the Nimble Storage arrays I can see 200-300MB/s in
>     > both directions.
>     >
>     > Thanks,
>     > Ryan
>     >
>     >
>     > _______________________________________________
>     > ceph-users mailing list -- ceph-users@xxxxxxx
>     <mailto:ceph-users@xxxxxxx>
>     > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>
>     >
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx