Re: Ceph, LIO, VMWARE anyone?

Zoltan Arnold Nagy <zoltan@xxxxxxxxxxxxxxxxxx> · Fri, 23 Jan 2015 14:46:46 +0100



    Just to chime in: it will look fine, feel fine, but underneath it's
    quite easy to get VMFS corruption. Happened in our tests.

    Also if you're running LIO, from time to time expect a kernel panic
    (haven't tried with the latest upstream, as I've been using

    Ubuntu 14.04 on my "export" hosts for the test, so might have
    improved...).

    
    As of now I would not recommend this setup without being aware of
    the risks involved.

    
    There have been a few upstream patches getting the LIO code in
    better cluster-aware shape, but no idea if they have been merged

    yet. I know RedHat has a guy on this.

    
    On 01/21/2015 02:40 PM, Nick Fisk
      wrote:

    
        Hi
            Jake,
         
        Thanks
            for this, I have been going through this and have a pretty
            good idea on what you are doing now, however I maybe missing
            something looking through your scripts, but I’m still not
            quite understanding how you are managing to make sure
            locking is happening with the ESXi ATS SCSI command.
         
        From
            this slide
         
        http://xo4t.mjt.lu/link/xo4t/gzyhtx3/1/_9gJVMUrSdvzGXYaZfCkVA/aHR0cHM6Ly93aWtpLmNlcGguY29tL0BhcGkvZGVraS9maWxlcy8zOC9oYW1tZXItY2VwaC1kZXZlbC1zdW1taXQtc2NzaS10YXJnZXQtY2x1c3RlcmluZy5wZGY  
            (Page 8)
         
        It
            seems to indicate that for a true active/active setup the
            two targets need to be aware of each other and exchange
            locking information for it to work reliably, I’ve also
            watched the video from the Ceph developer summit where this
            is discussed and it seems that Ceph+Kernel need changes to
            allow this locking to be pushed back to the RBD layer so it
            can be shared, from what I can see browsing through the
            Linux Git Repo, these patches haven’t made the mainline
            kernel yet.
         
        Can
            you shed any light on this? As tempting as having
            active/active is, I’m wary about using the configuration
            until I understand how the locking is working and if fringe
            cases involving multiple ESXi hosts writing to the same LUN
            on different targets could spell disaster.
         
        Many
            thanks,
        Nick
         
        From: Jake Young [mailto:jak3kaj@xxxxxxxxx] 

            Sent: 14 January 2015 16:54

            To: Nick Fisk

            Cc: Giuseppe Civitella; ceph-users

            Subject: Re:  Ceph, LIO, VMWARE anyone?
         
        
            Yes, it's active/active and I found
              that VMWare can switch from path to path with no issues or
              service impact.
          
          
          I posted some config files here: github.com/jak3kaj/misc
          
             
            One set is from my LIO nodes, both the
              primary and secondary configs so you can see what I needed
              to make unique.  The other set (targets.conf) are from my
              tgt nodes.  They are both 4 LUN configs.
          
          
            Like I said in my previous email, there
              is no performance difference between LIO and tgt.  The
              only service I'm running on these nodes is a single iscsi
              target instance (either LIO or tgt).
          
          
            Jake
          
        
            On Wed, Jan 14, 2015 at 8:41 AM, Nick
              Fisk <nick@xxxxxxxxxx>
              wrote:
            
              
                  Hi
                      Jake,
                   
                  I
                      can’t remember the exact details, but it was
                      something to do with a potential problem when
                      using the pacemaker resource agents. I think it
                      was to do with a potential hanging issue when one
                      LUN on a shared target failed and then it tried to
                      kill all the other LUNS to fail the target over to
                      another host. This then leaves the TCM part of LIO
                      locking the RBD which also can’t fail over.
                   
                  That
                      said I did try multiple LUNS on one target as a
                      test and didn’t experience any problems.
                   
                  I’m
                      interested in the way you have your setup
                      configured though. Are you saying you effectively
                      have an active/active configuration with a path
                      going to either host, or are you failing the iSCSI
                      IP between hosts? If it’s the former, have you had
                      any problems with scsi locking/reservations…etc
                      between the two targets?
                   
                  I
                      can see the advantage to that configuration as you
                      reduce/eliminate a lot of the troubles I have had
                      with resources failing over.
                   
                  Nick
                   
                  From: Jake Young [mailto:jak3kaj@xxxxxxxxx]
                      

                      Sent: 14 January 2015 12:50

                      To: Nick Fisk

                      Cc: Giuseppe Civitella; ceph-users

                      Subject: Re:  Ceph, LIO, VMWARE
                      anyone?
                  
                    
                      Nick,
                      
                         
                        Where
                          did you read that having more than 1 LUN per
                          target causes stability problems?
                      
                      
                        I
                          am running 4 LUNs per target. 
                      
                      
                        For
                          HA I'm running two linux iscsi target servers
                          that map the same 4 rbd images. The two
                          targets have the same serial numbers, T10
                          address, etc.  I copy the primary's config to
                          the backup and change IPs. This way VMWare
                          thinks they are different target IPs on the
                          same host. This has worked very well for me. 
                      
                      
                        One
                          suggestion I have is to try using rbd enabled
                          tgt. The performance is equivalent to LIO, but
                          I found it is much better at recovering from a
                          cluster outage. I've had LIO lock up the
                          kernel or simply not recognize that the rbd
                          images are available; where tgt will
                          eventually present the rbd images again. 
                      
                      
                        I
                          have been slowly adding servers and am
                          expanding my test setup to a production setup
                          (nice thing about ceph). I now have 6 OSD
                          hosts with 7 disks on each. I'm using the LSI
                          Nytro cache raid controller, so I don't have a
                          separate journal and have 40Gb networking. I
                          plan to add another 6 OSD hosts in another
                          rack in the next 6 months (and then another 6
                          next year). I'm doing 3x replication, so I
                          want to end up with 3 racks. 
                      
                      
                        Jake

                          
                          On Wednesday, January 14, 2015, Nick Fisk <nick@xxxxxxxxxx>
                          wrote:
                        
                          
                              Hi
                                  Giuseppe,
                               
                              I
                                  am working on something very similar
                                  at the moment. I currently have it
                                  working on some test hardware but
                                  seems to be working reasonably well.
                               
                              I
                                  say reasonably as I have had a few
                                  instability’s but these are on the HA
                                  side, the LIO and RBD side of things
                                  have been rock solid so far. The main
                                  problems I have had seem to be around
                                  recovering from failure with resources
                                  ending up in a unmanaged state. I’m
                                  not currently using fencing so this
                                  may be part of the cause.
                               
                              As
                                  a brief description of my
                                  configuration.
                               
                              4
                                  Hosts each having 2 OSD’s also running
                                  the monitor role
                              3
                                  additional host in a HA cluster which
                                  act as iSCSI proxy nodes.
                               
                              I’m
                                  using the IP, RBD, iSCSITarget and
                                  iSCSILUN resource agents to provide HA
                                  iSCSI LUN which maps back to a RBD.
                                  All the agents for each RBD are in a
                                  group so they follow each other
                                  between hosts.
                               
                              I’m
                                  using 1 LUN per target as I read
                                  somewhere there are stability problems
                                  using more than 1 LUN per target.
                               
                              Performance
                                  seems ok, I can get about 1.2k random
                                  IO’s out the iSCSI LUN. These seems to
                                  be about right for the Ceph cluster
                                  size, so I don’t think the LIO part is
                                  causing any significant overhead.
                               
                              We
                                  should be getting our production
                                  hardware shortly which wil have 40
                                  OSD’s with journals and a SSD caching
                                  tier, so within the next month or so I
                                  will have a better idea of running it
                                  in a production environment and the
                                  performance of the system.
                               
                              Hope
                                  that helps, if you have any questions,
                                  please let me know.
                               
                              Nick
                               
                              From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
                                  On Behalf Of Giuseppe
                                  Civitella

                                  Sent: 13 January 2015 11:23

                                  To: ceph-users

                                  Subject:  Ceph,
                                  LIO, VMWARE anyone?
                               
                              
                                Hi
                                  all,
                                
                                   
                                  I'm
                                    working on a lab setup regarding
                                    Ceph serving rbd images as ISCSI
                                    datastores to VMWARE via a LIO box.
                                    Is there someone that already did
                                    something similar wanting to share
                                    some knowledge? Any production
                                    deployments? What about LIO's HA and
                                    luns' performances?
                                
                                
                                  Thanks 
                                
                                
                                  Giuseppe
                                
                              
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com