Re: Very bad performance on a ceph rbd pool via iSCSI to VMware esx

Salsa <salsa@xxxxxxxxxxxxxx> · Fri, 14 Feb 2020 21:59:54 +0000

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, February 14, 2020 4:49 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote:

> On 02/13/2020 09:56 AM, Salsa wrote:
>
> > I have a 3 hosts, 10 4TB HDDs per host ceph storage set up. I deined a 3 replica rbd pool and some images and presented them to a Vmware host via ISCSI, but the write performance is so bad the I managed to freeze a VM doing a big rsync to a datastore inside ceph and had to reboot it's host (seems I've filled up Vmware's ISCSI queue).
> > Right now I'm getting write latencies from 20ms to 80 ms (per OSD) and sometimes peaking at 600 ms (per OSD).
> > Client throughput is giving me around 4 MBs.
>
> How are you testing client throughput? What tool and args?

Not testing. This is what my ceph grafana is showing me while I'm writing to the LUN (ISCSI LUN -> VMWARE -> DATASTORE -> VM I/O).

>
> > Using a 4MB stripe 1 image I got 1.955..359 B/s inside the VM.
> > On a 1MB stripe 1 I got 2.323.206 B/s inside the same VM.
>
> How are you getting the latency and throughput values for iscsi? Is it
> esxtop? Were you saying you filled up the vmware iscsi queue based on
> the esxtop queue values, and have you increased values like the ESX
> queue depth value like here:
>
> https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.troubleshooting.doc/GUID-0D774A67-F6AC-4D8A-9E5A-74140F036AD2.html
>

Latency and throughput also from ceph grafana, but I checked esxtop latency and it is almost the same.
About filling the queue, I think I've filled it because the esxi froze and I had to reboot he host (HW).
Haven't increased queue as I imagined it would only take longer to fill it up and freeze esxi again.

> Note: Sometimes people only increase iscsivmk_LunQDepth, but then forget
> to also increase iscsivmk_HostQDepth.
>
> What is your ceph-iscsi, tcmu-runner and kernel version on the target side?
>
TCMU-RUNNER 1.5.2
CEPH 14.2.6
Linux ceph01 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

> Here are some general tweaks for the target side:
>
> 1.  Increase the max_data_area_mb (this affects the LUN's max queueing)
>     and target side queue depth.
>
>
> gwcli
>
> ======
>
> cd /disks
> reconfigure rbd/<image_name> max_data_area_mb 128
>
> gwcli
>
> ======
>
> cd /iscsi-target/iqn.2003-01.com.redhat.iscsi-gw:ceph-igw
> reconfigure iqn.2003-01.com.redhat.iscsi-gw:ceph-igw cmdsn_depth 512
>
> 2. Are the VMs on the same iscsi LUN or different ones?
>
> If on the same LUN then increasing the max_data_area_mb value will help,
> because with smaller values we will get lots of qfulls and latency will
> be better since IO is not sitting in the target side queue waiting for
> memory. On the initaitor side though it is still sometimes helpful for
> testing to disable the vmware SIOC and adaptive queueing features.
>
> 3. Did you check the basics like the multipathing is setup correctly?
>
> esxcli storage nmp path list -d yourdevice
>
> shows all the expected paths?
>
> On the initiator and target side, did you check the logs for any errors
> going on when you run your test?
>
> 4. What test tool are you running and what args? If you just run a
> plain fio command like:
>
> fio --filename=some_new_file --bs=128K --size=5G --name=test
> --iodepth=128 --direct=1 --numjobs=8 --rw=read --ioengine=libaio
>
> what do you get?
>
> If you run almost the same fio command from the gateway machine or vm,
> but use --ioengine=rbd
>
> fio --bs=128K --size=5G --name=test --iodepth=128 --direct=1 --numjobs=8
> --rw=read --ioengine=rbd --rbdname=your_image --pool=your-pool_rbd
>
> how does it compare?

Will try these tweaks.
Only 1 VM so far.
Multipathing is correct and I am not using test tools. Real world usage.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx