Re: Info on clvmd with halvm on rhel 6.3 based clusters

Ryan Mitchell <rmitchel@xxxxxxxxxx> · Fri, 05 Jul 2013 10:42:47 +1000

Hi,

On 07/05/2013 01:03 AM, Gianluca Cecchi wrote:
Hello,
I already read these technotes so that it seems my configuration is
coherent with them:

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ap-ha-halvm-CA.html
https://access.redhat.com/site/solutions/409813

basically I would like to use clvmd with ha-lvm (as recommended) and
set up the cluster service with resources like this:

                 <resources>
                         <lvm lv_name="lv_prova" name="lv_prova"
vg_name="VG_PROVA"/>
                         <fs device="/dev/VG_PROVA/lv_prova"
force_fsck="0" force_unmount="1" fsid="50013" fstype="ext3
" mountpoint="/PROVA" name="PROVA" options="" self_fence="1"/>
                 </resources>

                 <service autostart="1" domain="MYDOM" name="MYSERVICE">
                         <lvm ref="lv_prova"/>
                         <fs ref="PROVA"/>
                 </service>

The problem is that if I starts both nodes, when clvmd starts it
activates all the VGs, because of

action "Activating VG(s):" ${lvm_vgchange} -ayl $LVM_VGS || return $?

in init script for clvmd and $LVM_VGS empty

So when the service starts, it fails in lv activation (because already
active) and then the service goes in failed state.

You aren't starting rgmanager with the -N option are you?  It is not the default.
# man clurgmgrd
       -N     Do  not  perform  stop-before-start.  Combined with the -Z flag to clusvcadm, this can be used to allow rgmanager to be upgraded
              without stopping a given user service or set of services.

What is supposed to happen is:
- clvmd is started at boot time, and all clustered logical volumes are activated (including CLVM HA-LVM volumes)
- rgmanager starts after clvmd, and it initializes all resources to ensure they are in a known state.  For example:
Jul  4 20:06:26 r6ha1 rgmanager[2478]: I am node #1
Jul  4 20:06:27 r6ha1 rgmanager[2478]: Resource Group Manager Starting
Jul  4 20:06:27 r6ha1 rgmanager[2478]: Loading Service Data
Jul  4 20:06:33 r6ha1 rgmanager[2478]: Initializing Services                  <----
Jul  4 20:06:33 r6ha1 rgmanager[3316]: [fs] stop: Could not match /dev/vgdata/lvmirror with a real device
Jul  4 20:06:33 r6ha1 rgmanager[2478]: stop on fs "fsdata" returned 2 (invalid argument(s))
Jul  4 20:06:35 r6ha1 rgmanager[2478]: Services Initialized
Jul  4 20:06:35 r6ha1 rgmanager[2478]: State change: Local UP
Jul  4 20:06:35 r6ha1 rgmanager[2478]: State change: r6ha2.cluster.net UP
- So when rgmanager starts, it stops the CLVM HA-LVM logical volumes again prior to starting the service, unless you disabled the "stop-before-start" option.

I did a quick test and I got the same results as you.  Can you show your resource/service definitions and the logs of when rgmanager starts up?

My system is registered with rhsm and bound to 6.3 release.
Current packages
lvm2-cluster-2.02.95-10.el6_3.3.x86_64
cman-3.0.12.1-32.el6_3.2.x86_64
lvm2-2.02.95-10.el6_3.3.x86_64

I can solve my problem if I set the clvmd init scripts as in rhel 5.9
where there is a conditional statement.
Diff between original 6.3 clvmd init script and mine is now:

$ diff clvmd clvmd.orig
32,34d31
< # Activate & deactivate clustered LVs
< CLVMD_ACTIVATE_VOLUMES=1
<
91,92c88
< if [ -n "$CLVMD_ACTIVATE_VOLUMES" ] ; then
< ${lvm_vgscan} > /dev/null 2>&1
---
${lvm_vgscan} > /dev/null 2>&1
94,95c90
< action "Activating VG(s):" ${lvm_vgchange} -ayl $LVM_VGS || return $?
< fi
---
action "Activating VG(s):" ${lvm_vgchange} -ayl $LVM_VGS || return $?

Then I set this in  /etc/sysconfig/clvmd
CLVMD_ACTIVATE_VOLUMES=""

Now all seems ok in start, stop and relocate.

This is another option, but it shouldn't be required if rgmanager is allowed to stop the resources prior to starting the service.  We could raise an RFE to add 
this functionality to RHEL6 if a case is opened.

Between technotes of 6.4 I only see this

BZ #729812
Prior to this update, occasional service failures occurred when
starting the clvmd variant of the
HA-LVM service on multiple nodes in a cluster at the same time. The
start of an HA-LVM
resource coincided with another node initializing that same HA-LVM
resource. With this update,
a patch has been introduced to synchronize the initialization of both
resources. As a result,
services no longer fail due to the simultaneous initialization.

but I'm not sure if it is related with my problem as it is private.

This is only related to starting the HA-LVM resources simultaneously on multiple nodes, and it synchronizes them correctly so it can only start on node node.

Can anyone give his/her opinion?
I'm going to open a case with redhat, but I would like to understand
if it's me missing something trivial.... as I think I would not be the
only one with this kind of configuration....

If you open a case with Red Hat, it may find its way to me and we can troubleshoot further.

Thanks in advance,

Gianluca

Regards,

Ryan Mitchell
Red Hat Global Support Services

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster