Re: Very bad performance on a ceph rbd pool via iSCSI to VMware esx

Mike Christie <mchristi@xxxxxxxxxx> · Fri, 14 Feb 2020 13:49:46 -0600

On 02/13/2020 09:56 AM, Salsa wrote:
> I have a 3 hosts, 10 4TB HDDs per host ceph storage set up. I deined a 3 replica rbd pool and some images and presented them to a Vmware host via ISCSI, but the write performance is so bad the I managed to freeze a VM doing a big rsync to a datastore inside ceph and had to reboot it's host (seems I've filled up Vmware's ISCSI queue).
> 
> Right now I'm getting write latencies from 20ms to 80 ms (per OSD) and sometimes peaking at 600 ms (per OSD).
> Client throughput is giving me around 4 MBs.

How are you testing client throughput? What tool and args?

> Using a 4MB stripe 1 image I got 1.955..359 B/s inside the VM.
> On a 1MB stripe 1 I got 2.323.206 B/s inside the same VM.

How are you getting the latency and throughput values for iscsi? Is it
esxtop? Were you saying you filled up the vmware iscsi queue based on
the esxtop queue values, and have you increased values like the ESX
queue depth value like here:

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.troubleshooting.doc/GUID-0D774A67-F6AC-4D8A-9E5A-74140F036AD2.html

Note: Sometimes people only increase iscsivmk_LunQDepth, but then forget
to also increase iscsivmk_HostQDepth.

What is your ceph-iscsi,  tcmu-runner and kernel version on the target side?

Here are some general tweaks for the target side:

1. Increase the max_data_area_mb (this affects the LUN's max queueing)
and target side queue depth.

# gwcli
cd /disks
reconfigure rbd/<image_name> max_data_area_mb 128

# gwcli
cd /iscsi-target/iqn.2003-01.com.redhat.iscsi-gw:ceph-igw
reconfigure iqn.2003-01.com.redhat.iscsi-gw:ceph-igw cmdsn_depth 512

2. Are the VMs on the same iscsi LUN or different ones?

If on the same LUN then increasing the max_data_area_mb value will help,
because with smaller values we will get lots of qfulls and latency will
be better since IO is not sitting in the target side queue waiting for
memory. On the initaitor side though it is still sometimes helpful for
testing to disable the vmware SIOC and adaptive queueing features.

3. Did you check the basics like the multipathing is setup correctly?

esxcli storage nmp path list -d yourdevice

shows all the expected paths?

On the initiator and target side, did you check the logs for any errors
going on when you run your test?

4.  What test tool are you running and what args? If you just run a
plain fio command like:

fio --filename=some_new_file --bs=128K --size=5G --name=test
--iodepth=128 --direct=1 --numjobs=8 --rw=read --ioengine=libaio

what do you get?

If you run almost the same fio command from the gateway machine or vm,
but use --ioengine=rbd

fio --bs=128K --size=5G --name=test --iodepth=128 --direct=1 --numjobs=8
--rw=read --ioengine=rbd --rbdname=your_image --pool=your-pool_rbd

how does it compare?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx