Re: GlusterFS, Pacemaker, OCF resource agents on CentOS 7

"Tomalak Geret'kal" <tom@xxxxxxxxx> · Fri, 8 Dec 2017 10:55:37 +0000



    Hi Jiffin

      
      Pacemaker clusters allow us to effectively distribute services
      across multiple computers.

      In my case, I am creating an active-passive cluster for my
      software, and my software relies on Apache, MySQL and GlusterFS.
      Thus, I want GlusterFS to be controlled by Pacemaker so that:

      
      1. A node can be deemed "bad" if GlusterFS is not running (using
      constraints to prohibit failover to a bad node)

      2. The GlusterFS volume can be automatically mounted on whatever's
      the active node

      3. Services all go into standby together

      
      Is this not the recommended approach? What else should I do?

      
      Thanks

      
      On 08/12/2017 10:17, Jiffin Tony Thottan wrote:

    
      Hi,
      Can u please explain for what purpose pacemaker cluster used
        here?
      Regards,
      Jiffin

      
      On Thursday 07 December 2017 06:59
        PM, Tomalak Geret'kal wrote:

      
        Hi guys
        I'm wondering if anyone here is using the GlusterFS OCF
          resource agents with Pacemaker on CentOS 7?
        yum install centos-release-gluster

          yum install glusterfs-server
            glusterfs-resource-agents
        The reason I ask is that there seem to be a few problems with
          them on 3.10, but these problems are so severe that I'm
          struggling to believe I'm not just doing something wrong.
        I created my brick (on a volume previously used for DRBD,
          thus its name):
        mkfs.xfs /dev/cl/lv_drbd -f

          mkdir -p /gluster/test_brick

          mount -t xfs /dev/cl/lv_drbd /gluster

        
        And then my volume (enabling clients to mount it via NFS):
        systemctl start glusterd

          gluster volume create logs replica 2 transport tcp pcmk01-drbd:/gluster/test_brick
            pcmk02-drbd:/gluster/test_brick

          gluster volume start test_logs

          gluster volume set test_logs nfs.disable off

        
        And here's where the fun starts.
        Firstly, we need to work around bug 1233344* (which was
          closed when 3.7 went end-of-life but still seems valid in
          3.10):
        sed -i
's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#'
            /usr/lib/ocf/resource.d/glusterfs/volume

        
        With that done, I [attempt to] stop GlusterFS so it can be
          brought under Pacemaker control:

        
        systemctl stop glusterfsd

          systemctl stop glusterd

          umount /gluster
        (I usually have to manually kill glusterfs processes at this
          point before the unmount works - why does the systemctl stop
          not do it?)
        With the node in standby (just one is online in this example,
          but another is configured), I then set up the resources:
        pcs node standby

          pcs resource create gluster_data
            ocf:heartbeat:Filesystem device="/dev/cl/lv_drbd"
            directory="/gluster" fstype="xfs"

          pcs resource create glusterd ocf:glusterfs:glusterd

          pcs resource create gluster_vol ocf:glusterfs:volume
            volname="test_logs"

          pcs resource create test_logs
            ocf:heartbeat:Filesystem \

              device="localhost:/test_logs"
            directory="/var/log/test" fstype="nfs" \

             
options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0"
            \

              op monitor OCF_CHECK_LEVEL="20"

          pcs resource clone glusterd

          pcs resource clone gluster_data

          pcs resource clone gluster_vol ordered=true

          pcs constraint order start gluster_data-clone then
            start glusterd-clone

          pcs constraint order start glusterd-clone then start
            gluster_vol-clone

          pcs constraint order start gluster_vol-clone then
            start test_logs

          pcs constraint colocation add test_logs with
            FloatingIp INFINITY

        
        (note the SELinux wrangling - this is because I have a CGI
          web application which will later need to read files from the /var/log/test
          mount)
        At this point, even with the node in standby, it's already
          failing:
        [root@pcmk01 ~]# pcs status

          Cluster name: test_cluster

          Stack: corosync

          Current DC: pcmk01-cr (version
            1.1.15-11.el7_3.5-e174ec8) - partition WITHOUT quorum

          Last updated: Thu Dec  7 13:20:41 2017          Last
            change: Thu Dec  7 13:09:33 2017 by root via crm_attribute
            on pcmk01-cr

          
          2 nodes and 13 resources configured

          
          Online: [ pcmk01-cr ]

          OFFLINE: [ pcmk02-cr ]

          
          Full list of resources:

          
           FloatingIp     (ocf::heartbeat:IPaddr2):      
            Started pcmk01-cr

           test_logs      (ocf::heartbeat:Filesystem):   
            Stopped

           Clone Set: glusterd-clone [glusterd]

               Stopped: [ pcmk01-cr pcmk02-cr ]

           Clone Set: gluster_data-clone [gluster_data]

               Stopped: [ pcmk01-cr pcmk02-cr ]

           Clone Set: gluster_vol-clone [gluster_vol]

               gluster_vol       
            (ocf::glusterfs:volume):        FAILED pcmk01-cr (blocked)

               Stopped: [ pcmk02-cr ]

          
          Failed Actions:

          * gluster_data_start_0 on pcmk01-cr 'not configured'
            (6): call=72, status=complete, exitreason='DANGER! xfs on
            /dev/cl/lv_drbd is NOT cluster-aware!',

              last-rc-change='Thu Dec  7 13:09:28 2017',
            queued=0ms, exec=250ms

          * gluster_vol_stop_0 on pcmk01-cr 'unknown error'
            (1): call=60, status=Timed Out, exitreason='none',

              last-rc-change='Thu Dec  7 12:55:11 2017',
            queued=0ms, exec=20004ms

          
          Daemon Status:

            corosync: active/enabled

            pacemaker: active/enabled

            pcsd: active/enabled

          
        1. The data mount can't be created? Why?

          2. Why is there a volume "stop" command being attempted, and
          why does it fail?

          3. Why is any of this happening in standby? I can't have the
          resources failing before I've even made the node live! I could
          understand why a gluster_vol start operation would fail when
          glusterd is (correctly) stopped, but why is there a *stop*
          operation? And why does that make the resource "blocked"?

        
        Given the above steps, is there something fundamental I'm
          missing about how these resource agents should be used? How do
          *you* configure GlusterFS on Pacemaker?
        Any advice appreciated.

        
        Best regards

        
        * https://bugzilla.redhat.com/show_bug.cgi?id=1233344
        

        _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
      
      
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users