Re: ceph + vmware

Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> · Fri, 15 Jul 2016 23:43:03 +0200

Hi,

i am currently trying out the stuff.

My tgt config:

# cat tgtd.conf
# The default config file
include /etc/tgt/targets.conf

# Config files from other packages etc.
include /etc/tgt/conf.d/*.conf

nr_iothreads=128

-----

# cat iqn.2016-07.tgt.esxi-test.conf
<target iqn.2016-07.tgt.esxi-test>
  initiator-address ALL
  scsi_sn esxi-test
  #vendor_id CEPH
  #controller_tid 1
  write-cache on
  read-cache on
  driver iscsi
  bs-type rbd
  <backing-store vmware1/esxi-test>
  lun 1
  scsi_id cf10000c4a71e700506357
  </backing-store>
  </target>

--------------

If i create a vm inside esxi 6 and try to format the virtual hdd, i see
in logs:

sd:2:0:0:0: [sda] CDB:
Write(10): 2a 00 0f 86 a8 80 00 01 40 00
mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff880068aa5e00)
mptscsih: ioc0: attempting task abort! ( sc=ffff880068aa4a80)

With the LSI HDD emulation. With the vmware paravirtualization
everything just freeze.

Any idea with that issue ?

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

Am 11.07.2016 um 22:24 schrieb Jake Young:
> I'm using this setup with ESXi 5.1 and I get very good performance.  I
> suspect you have other issues.  Reliability is another story (see Nick's
> posts on tgt and HA to get an idea of the awful problems you can have),
> but for my test labs the risk is acceptable.
> 
> 
> One change I found helpful is to run tgtd with 128 threads.  I'm running
> Ubuntu 14.04, so I editted my /etc/init.tgt.conf file and changed the
> line that read:
> 
> exec tgtd
> 
> to 
> 
> exec tgtd --nr_iothreads=128
> 
> 
> If you're not concerned with reliability, you can enhance throughput
> even more by enabling rbd client write-back cache in your tgt VM's
> ceph.conf file (you'll need to restart tgtd for this to take effect):
> 
> [client]
> rbd_cache = true
> rbd_cache_size = 67108864 # (64MB)
> rbd_cache_max_dirty = 50331648 # (48MB)
> rbd_cache_target_dirty = 33554432 # (32MB)
> rbd_cache_max_dirty_age = 2
> rbd_cache_writethrough_until_flush = false
> 
> 
> 
> 
> Here's a sample targets.conf:
> 
>   <target iqn.2014-04.tgt.Charter>
>   initiator-address ALL
>   scsi_sn Charter
>   #vendor_id CEPH
>   #controller_tid 1
>   write-cache on
>   read-cache on
>   driver iscsi
>   bs-type rbd
>   <backing-store charter/vmguest>
>   lun 5
>   scsi_id cfe1000c4a71e700506357
>   </backing-store>
>   <backing-store charter/voting>
>   lun 6
>   scsi_id cfe1000c4a71e700507157
>   </backing-store>
>   <backing-store charter/oradata>
>   lun 7
>   scsi_id cfe1000c4a71e70050da7a
>   </backing-store>
>   <backing-store charter/oraback>
>   lun 8
>   scsi_id cfe1000c4a71e70050bac0
>   </backing-store>
>   </target>
> 
> 
> 
> I don't have FIO numbers handy, but I have some oracle calibrate io
> output.  
> 
> We're running Oracle RAC database servers in linux VMs on ESXi 5.1,
> which use iSCSI to connect to the tgt service.  I only have a single
> connection setup in ESXi for each LUN.  I tested using multipathing and
> two tgt VMs presenting identical LUNs/RBD disks, but found that there
> wasn't a significant performance gain by doing this, even with
> round-robin path selecting in VMware.
> 
> 
> These tests were run from two RAC VMs, each on a different host, with
> both hosts connected to the same tgt instance.  The way we have oracle
> configured, it would have been using two of the LUNs heavily during this
> calibrate IO test.
> 
> 
> This output is with 128 threads in tgtd and rbd client cache enabled:
> 
> START_TIME           END_TIME               MAX_IOPS   MAX_MBPS  MAX_PMBPS   LATENCY       DISKS
> -------------------- -------------------- ---------- ---------- ---------- ---------- ----------
> 28-JUN-016 15:10:50  28-JUN-016 15:20:04       14153        658        412       14          75
> 
> 
> This output is with the same configuration, but with rbd client cache
> disabled:
> 
> START_TIME         END_TIME            MAX_IOPS   MAX_MBPS  MAX_PMBPS    LATENCY       DISKS
> -------------------- -------------------- ---------- ---------- ---------- ---------- ----------
> 28-JUN-016 22:44:29  28-JUN-016 22:49:05    7449        161        219       20          75
> 
> This output is from a directly connected EMC VNX5100 FC SAN with 25
> disks using dual 8Gb FC links on a different lab system:
> 
> START_TIME         END_TIME            MAX_IOPS   MAX_MBPS  MAX_PMBPS    LATENCY       DISKS
> -------------------- -------------------- ---------- ---------- ---------- ---------- ----------
> 28-JUN-016 22:11:25  28-JUN-016 22:18:48    6487        299        224       19          75
> 
> 
> One of our goals for our Ceph cluster is to replace the EMC SANs.  We've
> accomplished this performance wise, the next step is to get a plausible
> iSCSI HA solution working.  I'm very interested in what Mike Christie is
> putting together.  I'm in the process of vetting the SUSE solution now.
> 
> BTW - The tests were run when we had 75 OSDs, which are all 7200RPM 2TB
> HDs, across 9 OSD hosts.  We have no SSD journals, instead we have all
> the disks setup as single disk RAID1 disk groups with WB cache with
> BBU.  All OSD hosts have 40Gb networking and the ESXi hosts have 10G.
> 
> Jake
> 
> 
> On Mon, Jul 11, 2016 at 12:06 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx
> <mailto:info@xxxxxxxxxxxxxxxxx>> wrote:
> 
>     Hi Mike,
> 
>     i was trying:
> 
>     https://ceph.com/dev-notes/adding-support-for-rbd-to-stgt/
> 
>     ONE target, from different OSD servers directly, to multiple vmware esxi
>     servers.
> 
>     A config looked like:
> 
>     #cat iqn.ceph-cluster_netzlaboranten-storage.conf
> 
>     <target iqn.ceph-cluster:vmware-storage>
>     driver iscsi
>     bs-type rbd
>     backing-store rbd/vmware-storage
>     initiator-address 10.0.0.9
>     initiator-address 10.0.0.10
>     incominguser vmwaren-storage RPb18P0xAqkAw4M1
>     </target>
> 
> 
>     We had 4 OSD servers. Everyone had this config running.
>     We had 2 vmware servers ( esxi ).
> 
>     So we had 4 paths to this vmware-storage RBD object.
> 
>     VMware, in the very end, had 8 paths ( 4 path's directly connected to
>     the specific vmware server ) + 4 paths this specific vmware servers saw
>     via the other vmware server ).
> 
>     There were very big problems with performance. I am talking about < 10
>     MB/s. So the customer was not able to use it, so good old nfs is
>     serving.
> 
>     At that time we used ceph hammer, and i think esxi 5.5 the customer was
>     using, or maybe esxi 6, was somewhere last year the testing.
> 
>     --------------------
> 
>     We will make a new attempt now with ceph jewel and esxi 6 and this time
>     we will manage the vmware servers.
> 
>     As soon as we fixed this
> 
>     "ceph mon Segmentation fault after set crush_ruleset ceph 10.2.2"
> 
>     what i already mailed here to the list is solved, we can start the
>     testing.
> 
> 
>     --
>     Mit freundlichen Gruessen / Best regards
> 
>     Oliver Dzombic
>     IP-Interactive
> 
>     mailto:info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>
> 
>     Anschrift:
> 
>     IP Interactive UG ( haftungsbeschraenkt )
>     Zum Sonnenberg 1-3
>     63571 Gelnhausen
> 
>     HRB 93402 beim Amtsgericht Hanau
>     Geschäftsführung: Oliver Dzombic
> 
>     Steuer Nr.: 35 236 3622 1 <tel:35%20236%203622%201>
>     UST ID: DE274086107
> 
> 
>     Am 11.07.2016 um 17:45 schrieb Mike Christie:
>     > On 07/08/2016 02:22 PM, Oliver Dzombic wrote:
>     >> Hi,
>     >>
>     >> does anyone have experience how to connect vmware with ceph smart ?
>     >>
>     >> iSCSI multipath does not really worked well.
>     >
>     > Are you trying to export rbd images from multiple iscsi targets at the
>     > same time or just one target?
>     >
>     > For the HA/multiple target setup, I am working on this for Red Hat. We
>     > plan to release it in RHEL 7.3/RHCS 2.1. SUSE ships something
>     already as
>     > someone mentioned.
>     >
>     > We just got a large chunk of code in the upstream kernel (it is in the
>     > block layer maintainer's tree for the next kernel) so it should be
>     > simple to add COMPARE_AND_WRITE support now. We should be posting krbd
>     > exclusive lock support in the next couple weeks.
>     >
>     >
>     >> NFS could be, but i think thats just too much layers in between
>     to have
>     >> some useable performance.
>     >>
>     >> Systems like ScaleIO have developed a vmware addon to talk with it.
>     >>
>     >> Is there something similar out there for ceph ?
>     >>
>     >> What are you using ?
>     >>
>     >> Thank you !
>     >>
>     >
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com