Re: ceph + vmware

Jake Young <jak3kaj@xxxxxxxxx> · Mon, 11 Jul 2016 16:24:30 -0400

I'm using this setup with ESXi 5.1 and I get very good performance.  I suspect you have other issues.  Reliability is another story (see Nick's posts on tgt and HA to get an idea of the awful problems you can have), but for my test labs the risk is acceptable.

One change I found helpful is to run tgtd with 128 threads.  I'm running Ubuntu 14.04, so I editted my /etc/init.tgt.conf file and changed the line that read:

exec tgtd

to 

exec tgtd --nr_iothreads=128

If you're not concerned with reliability, you can enhance throughput even more by enabling rbd client write-back cache in your tgt VM's ceph.conf file (you'll need to restart tgtd for this to take effect):

[client]
rbd_cache = true
rbd_cache_size = 67108864 # (64MB)
rbd_cache_max_dirty = 50331648 # (48MB)
rbd_cache_target_dirty = 33554432 # (32MB)
rbd_cache_max_dirty_age = 2
rbd_cache_writethrough_until_flush = false

Here's a sample targets.conf:

  <target iqn.2014-04.tgt.Charter>
  	initiator-address ALL
  	scsi_sn Charter
  	#vendor_id CEPH
  	#controller_tid 1
  	write-cache on
  	read-cache on
  	driver iscsi
  	bs-type rbd
  	<backing-store charter/vmguest>
  		lun 5
  		scsi_id cfe1000c4a71e700506357
  	</backing-store>
  	<backing-store charter/voting>
  		lun 6
  		scsi_id cfe1000c4a71e700507157
  	</backing-store>
  	<backing-store charter/oradata>
  		lun 7
  		scsi_id cfe1000c4a71e70050da7a
  	</backing-store>
  	<backing-store charter/oraback>
  		lun 8
  		scsi_id cfe1000c4a71e70050bac0
  	</backing-store>
  </target>

I don't have FIO numbers handy, but I have some oracle calibrate io output.  

We're running Oracle RAC database servers in linux VMs on ESXi 5.1, which use iSCSI to connect to the tgt service.  I only have a single connection setup in ESXi for each LUN.  I tested using multipathing and two tgt VMs presenting identical LUNs/RBD disks, but found that there wasn't a significant performance gain by doing this, even with round-robin path selecting in VMware.

These tests were run from two RAC VMs, each on a different host, with both hosts connected to the same tgt instance.  The way we have oracle configured, it would have been using two of the LUNs heavily during this calibrate IO test.

This output is with 128 threads in tgtd and rbd client cache enabled:
START_TIME           END_TIME               MAX_IOPS   MAX_MBPS  MAX_PMBPS   LATENCY       DISKS
-------------------- -------------------- ---------- ---------- ---------- ---------- ----------
28-JUN-016 15:10:50  28-JUN-016 15:20:04       14153        658        412       14          75

This output is with the same configuration, but with rbd client cache disabled:
START_TIME         END_TIME            MAX_IOPS   MAX_MBPS  MAX_PMBPS    LATENCY       DISKS
-------------------- -------------------- ---------- ---------- ---------- ---------- ----------
28-JUN-016 22:44:29  28-JUN-016 22:49:05    7449        161        219       20          75

This output is from a directly connected EMC VNX5100 FC SAN with 25 disks using dual 8Gb FC links on a different lab system:
START_TIME         END_TIME            MAX_IOPS   MAX_MBPS  MAX_PMBPS    LATENCY       DISKS
-------------------- -------------------- ---------- ---------- ---------- ---------- ----------
28-JUN-016 22:11:25  28-JUN-016 22:18:48    6487        299        224       19          75

One of our goals for our Ceph cluster is to replace the EMC SANs.  We've accomplished this performance wise, the next step is to get a plausible iSCSI HA solution working.  I'm very interested in what Mike Christie is putting together.  I'm in the process of vetting the SUSE solution now.

BTW - The tests were run when we had 75 OSDs, which are all 7200RPM 2TB HDs, across 9 OSD hosts.  We have no SSD journals, instead we have all the disks setup as single disk RAID1 disk groups with WB cache with BBU.  All OSD hosts have 40Gb networking and the ESXi hosts have 10G.

Jake

On Mon, Jul 11, 2016 at 12:06 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote:
Hi Mike,

i was trying:

https://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/

ONE target, from different OSD servers directly, to multiple vmware esxi

servers.

A config looked like:

#cat iqn.ceph-cluster_netzlaboranten-storage.conf

<target iqn.ceph-cluster:vmware-storage>

driver iscsi

bs-type rbd

backing-store rbd/vmware-storage

initiator-address 10.0.0.9

initiator-address 10.0.0.10

incominguser vmwaren-storage RPb18P0xAqkAw4M1

</target>

We had 4 OSD servers. Everyone had this config running.

We had 2 vmware servers ( esxi ).

So we had 4 paths to this vmware-storage RBD object.

VMware, in the very end, had 8 paths ( 4 path's directly connected to

the specific vmware server ) + 4 paths this specific vmware servers saw

via the other vmware server ).

There were very big problems with performance. I am talking about < 10

MB/s. So the customer was not able to use it, so good old nfs is serving.

At that time we used ceph hammer, and i think esxi 5.5 the customer was

using, or maybe esxi 6, was somewhere last year the testing.

--------------------

We will make a new attempt now with ceph jewel and esxi 6 and this time

we will manage the vmware servers.

As soon as we fixed this

"ceph mon Segmentation fault after set crush_ruleset ceph 10.2.2"

what i already mailed here to the list is solved, we can start the testing.

--

Mit freundlichen Gruessen / Best regards

Oliver Dzombic

IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )

Zum Sonnenberg 1-3

63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau

Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1

UST ID: DE274086107

Am 11.07.2016 um 17:45 schrieb Mike Christie:

> On 07/08/2016 02:22 PM, Oliver Dzombic wrote:

>> Hi,

>>

>> does anyone have experience how to connect vmware with ceph smart ?

>>

>> iSCSI multipath does not really worked well.

>

> Are you trying to export rbd images from multiple iscsi targets at the

> same time or just one target?

>

> For the HA/multiple target setup, I am working on this for Red Hat. We

> plan to release it in RHEL 7.3/RHCS 2.1. SUSE ships something already as

> someone mentioned.

>

> We just got a large chunk of code in the upstream kernel (it is in the

> block layer maintainer's tree for the next kernel) so it should be

> simple to add COMPARE_AND_WRITE support now. We should be posting krbd

> exclusive lock support in the next couple weeks.

>

>

>> NFS could be, but i think thats just too much layers in between to have

>> some useable performance.

>>

>> Systems like ScaleIO have developed a vmware addon to talk with it.

>>

>> Is there something similar out there for ceph ?

>>

>> What are you using ?

>>

>> Thank you !

>>

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com