Re: GlusterFS, Pacemaker, OCF resource agents on CentOS 7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jiffin

Pacemaker clusters allow us to effectively distribute services across multiple computers.
In my case, I am creating an active-passive cluster for my software, and my software relies on Apache, MySQL and GlusterFS. Thus, I want GlusterFS to be controlled by Pacemaker so that:

1. A node can be deemed "bad" if GlusterFS is not running (using constraints to prohibit failover to a bad node)
2. The GlusterFS volume can be automatically mounted on whatever's the active node
3. Services all go into standby together

Is this not the recommended approach? What else should I do?

Thanks


On 08/12/2017 10:17, Jiffin Tony Thottan wrote:

Hi,

Can u please explain for what purpose pacemaker cluster used here?

Regards,

Jiffin


On Thursday 07 December 2017 06:59 PM, Tomalak Geret'kal wrote:

Hi guys

I'm wondering if anyone here is using the GlusterFS OCF resource agents with Pacemaker on CentOS 7?

yum install centos-release-gluster
yum install glusterfs-server glusterfs-resource-agents

The reason I ask is that there seem to be a few problems with them on 3.10, but these problems are so severe that I'm struggling to believe I'm not just doing something wrong.

I created my brick (on a volume previously used for DRBD, thus its name):

mkfs.xfs /dev/cl/lv_drbd -f
mkdir -p /gluster/test_brick
mount -t xfs /dev/cl/lv_drbd /gluster

And then my volume (enabling clients to mount it via NFS):

systemctl start glusterd
gluster volume create logs replica 2 transport tcp pcmk01-drbd:/gluster/test_brick pcmk02-drbd:/gluster/test_brick
gluster volume start test_logs
gluster volume set test_logs nfs.disable off

And here's where the fun starts.

Firstly, we need to work around bug 1233344* (which was closed when 3.7 went end-of-life but still seems valid in 3.10):

sed -i 's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#' /usr/lib/ocf/resource.d/glusterfs/volume

With that done, I [attempt to] stop GlusterFS so it can be brought under Pacemaker control:

systemctl stop glusterfsd
systemctl stop glusterd
umount /gluster

(I usually have to manually kill glusterfs processes at this point before the unmount works - why does the systemctl stop not do it?)

With the node in standby (just one is online in this example, but another is configured), I then set up the resources:

pcs node standby
pcs resource create gluster_data ocf:heartbeat:Filesystem device="/dev/cl/lv_drbd" directory="/gluster" fstype="xfs"
pcs resource create glusterd ocf:glusterfs:glusterd
pcs resource create gluster_vol ocf:glusterfs:volume volname="test_logs"
pcs resource create test_logs ocf:heartbeat:Filesystem \
    device="localhost:/test_logs" directory="/var/log/test" fstype="nfs" \
    options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0" \
    op monitor OCF_CHECK_LEVEL="20"
pcs resource clone glusterd
pcs resource clone gluster_data
pcs resource clone gluster_vol ordered=true
pcs constraint order start gluster_data-clone then start glusterd-clone
pcs constraint order start glusterd-clone then start gluster_vol-clone
pcs constraint order start gluster_vol-clone then start test_logs
pcs constraint colocation add test_logs with FloatingIp INFINITY

(note the SELinux wrangling - this is because I have a CGI web application which will later need to read files from the /var/log/test mount)

At this point, even with the node in standby, it's already failing:

[root@pcmk01 ~]# pcs status
Cluster name: test_cluster
Stack: corosync
Current DC: pcmk01-cr (version 1.1.15-11.el7_3.5-e174ec8) - partition WITHOUT quorum
Last updated: Thu Dec  7 13:20:41 2017          Last change: Thu Dec  7 13:09:33 2017 by root via crm_attribute on pcmk01-cr

2 nodes and 13 resources configured

Online: [ pcmk01-cr ]
OFFLINE: [ pcmk02-cr ]

Full list of resources:

 FloatingIp     (ocf::heartbeat:IPaddr2):       Started pcmk01-cr
 test_logs      (ocf::heartbeat:Filesystem):    Stopped
 Clone Set: glusterd-clone [glusterd]
     Stopped: [ pcmk01-cr pcmk02-cr ]
 Clone Set: gluster_data-clone [gluster_data]
     Stopped: [ pcmk01-cr pcmk02-cr ]
 Clone Set: gluster_vol-clone [gluster_vol]
     gluster_vol        (ocf::glusterfs:volume):        FAILED pcmk01-cr (blocked)
     Stopped: [ pcmk02-cr ]

Failed Actions:
* gluster_data_start_0 on pcmk01-cr 'not configured' (6): call=72, status=complete, exitreason='DANGER! xfs on /dev/cl/lv_drbd is NOT cluster-aware!',
    last-rc-change='Thu Dec  7 13:09:28 2017', queued=0ms, exec=250ms
* gluster_vol_stop_0 on pcmk01-cr 'unknown error' (1): call=60, status=Timed Out, exitreason='none',
    last-rc-change='Thu Dec  7 12:55:11 2017', queued=0ms, exec=20004ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

1. The data mount can't be created? Why?
2. Why is there a volume "stop" command being attempted, and why does it fail?
3. Why is any of this happening in standby? I can't have the resources failing before I've even made the node live! I could understand why a gluster_vol start operation would fail when glusterd is (correctly) stopped, but why is there a *stop* operation? And why does that make the resource "blocked"?

Given the above steps, is there something fundamental I'm missing about how these resource agents should be used? How do *you* configure GlusterFS on Pacemaker?

Any advice appreciated.

Best regards


* https://bugzilla.redhat.com/show_bug.cgi?id=1233344




_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux