GlusterFS, Pacemaker, OCF resource agents on CentOS 7

"Tomalak Geret'kal" <tom@xxxxxxxxx> · Thu, 7 Dec 2017 13:29:11 +0000



    Hi guys
    I'm wondering if anyone here is using the GlusterFS OCF resource
      agents with Pacemaker on CentOS 7?
    yum install centos-release-gluster

      yum install glusterfs-server glusterfs-resource-agents
    The reason I ask is that there seem to be a few problems with
      them on 3.10, but these problems are so severe that I'm struggling
      to believe I'm not just doing something wrong.
    I created my brick (on a volume previously used for DRBD, thus
      its name):
    mkfs.xfs /dev/cl/lv_drbd -f

      mkdir -p /gluster/test_brick

      mount -t xfs /dev/cl/lv_drbd /gluster

    
    And then my volume (enabling clients to mount it via NFS):
    systemctl start glusterd

      gluster volume create logs replica 2 transport tcp pcmk01-drbd:/gluster/test_brick
        pcmk02-drbd:/gluster/test_brick

      gluster volume start test_logs

      gluster volume set test_logs nfs.disable off

    
    And here's where the fun starts.
    Firstly, we need to work around bug 1233344* (which was closed
      when 3.7 went end-of-life but still seems valid in 3.10):
    sed -i
's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#'
        /usr/lib/ocf/resource.d/glusterfs/volume

    
    With that done, I [attempt to] stop GlusterFS so it can be
      brought under Pacemaker control:

    
    systemctl stop glusterfsd

      systemctl stop glusterd

      umount /gluster
    (I usually have to manually kill glusterfs processes at this
      point before the unmount works - why does the systemctl stop not
      do it?)
    With the node in standby (just one is online in this example, but
      another is configured), I then set up the resources:
    pcs node standby

      pcs resource create gluster_data ocf:heartbeat:Filesystem
        device="/dev/cl/lv_drbd" directory="/gluster" fstype="xfs"

      pcs resource create glusterd ocf:glusterfs:glusterd

      pcs resource create gluster_vol ocf:glusterfs:volume
        volname="test_logs"

      pcs resource create test_logs ocf:heartbeat:Filesystem \

          device="localhost:/test_logs"
        directory="/var/log/test" fstype="nfs" \

         
options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0"
        \

          op monitor OCF_CHECK_LEVEL="20"

      pcs resource clone glusterd

      pcs resource clone gluster_data

      pcs resource clone gluster_vol ordered=true

      pcs constraint order start gluster_data-clone then start
        glusterd-clone

      pcs constraint order start glusterd-clone then start
        gluster_vol-clone

      pcs constraint order start gluster_vol-clone then start
        test_logs

      pcs constraint colocation add test_logs with FloatingIp
        INFINITY

    
    (note the SELinux wrangling - this is because I have a CGI web
      application which will later need to read files from the /var/log/test
      mount)
    At this point, even with the node in standby, it's already
      failing:
    [root@pcmk01 ~]# pcs status

      Cluster name: test_cluster

      Stack: corosync

      Current DC: pcmk01-cr (version 1.1.15-11.el7_3.5-e174ec8)
        - partition WITHOUT quorum

      Last updated: Thu Dec  7 13:20:41 2017          Last
        change: Thu Dec  7 13:09:33 2017 by root via crm_attribute on
        pcmk01-cr

      
      2 nodes and 13 resources configured

      
      Online: [ pcmk01-cr ]

      OFFLINE: [ pcmk02-cr ]

      
      Full list of resources:

      
       FloatingIp     (ocf::heartbeat:IPaddr2):       Started
        pcmk01-cr

       test_logs      (ocf::heartbeat:Filesystem):    Stopped

       Clone Set: glusterd-clone [glusterd]

           Stopped: [ pcmk01-cr pcmk02-cr ]

       Clone Set: gluster_data-clone [gluster_data]

           Stopped: [ pcmk01-cr pcmk02-cr ]

       Clone Set: gluster_vol-clone [gluster_vol]

           gluster_vol        (ocf::glusterfs:volume):       
        FAILED pcmk01-cr (blocked)

           Stopped: [ pcmk02-cr ]

      
      Failed Actions:

      * gluster_data_start_0 on pcmk01-cr 'not configured' (6):
        call=72, status=complete, exitreason='DANGER! xfs on
        /dev/cl/lv_drbd is NOT cluster-aware!',

          last-rc-change='Thu Dec  7 13:09:28 2017',
        queued=0ms, exec=250ms

      * gluster_vol_stop_0 on pcmk01-cr 'unknown error' (1):
        call=60, status=Timed Out, exitreason='none',

          last-rc-change='Thu Dec  7 12:55:11 2017',
        queued=0ms, exec=20004ms

      
      Daemon Status:

        corosync: active/enabled

        pacemaker: active/enabled

        pcsd: active/enabled

      
    1. The data mount can't be created? Why?

      2. Why is there a volume "stop" command being attempted, and why
      does it fail?

      3. Why is any of this happening in standby? I can't have the
      resources failing before I've even made the node live! I could
      understand why a gluster_vol start operation would fail when
      glusterd is (correctly) stopped, but why is there a *stop*
      operation? And why does that make the resource "blocked"?

    
    Given the above steps, is there something fundamental I'm missing
      about how these resource agents should be used? How do *you*
      configure GlusterFS on Pacemaker?
    Any advice appreciated.

    
    Best regards

    
    * https://bugzilla.redhat.com/show_bug.cgi?id=1233344
    

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users