Clvm Hang after an node is fenced in a 2 nodes cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Description of problem: In a 2 nodes cluster, after 1 node is fence, any clvm command hang on the ramaining node. when the fenced node cluster come back in the cluster, any clvm command also hang, moreover the node do not activate any clustered vg, and so do not access any shared device.


Version-Release number of selected component (if applicable):
redhat 5.2
update device-mapper-1.02.28-2.el5.x86_64.rpm
      lvm2-2.02.40-6.el5.x86_64.rpm
      lvm2-cluster-2.02.40-7.el5.x86_64.rpm


Steps to Reproduce:
1.2 nodes cluster , quorum formed with qdisk
2.cold boot node 2
3.node 2 is evicted and fenced, service are taken over by node 1
4.node é come back in cluster, quorate, but no clustered vg are up and any lvm related command hang
5.At this step every lvm command hang on node 1


Expected results: node 2 should be able to get back the lock on clustered lvm volume and node 1 should be able to issue any lvm relate command

Here are my cluster.conf and lvm.conf
<?xml version="1.0"?>
<cluster alias="rome" config_version="53" name="rome">
<fence_daemon clean_start="0" post_fail_delay="9" post_join_delay="6"/>
       <clusternodes>
               <clusternode name="romulus.fr" nodeid="1" votes="1">
                       <fence>
                               <method name="1">
                                       <device name="ilo172"/>
                               </method>
                       </fence>
               </clusternode>
               <clusternode name="remus.fr" nodeid="2" votes="1">
                       <fence>
                               <method name="1">
                                       <device name="ilo173"/>
                               </method>
                       </fence>
               </clusternode>
       </clusternodes>
       <cman expected_votes="3"/>
<totem consensus="4800" join="60" token="21002" token_retransmits_before_loss_const="20"/>
       <fencedevices>
<fencedevice agent="fence_ilo" hostname="X.X.X.X" login="Administrator" name="ilo172" passwd="X.X.X.X"/> <fencedevice agent="fence_ilo" hostname="XXXX" login="Administrator" name="ilo173" passwd="XXXX"/>
       </fencedevices>
       <rm>
               <failoverdomains/>
               <resources/>
<vm autostart="1" exclusive="0" migrate="live" name="alfrescoP64" path="/etc/xen" recovery="relocate"/> <vm autostart="1" exclusive="0" migrate="live" name="alfrescoI64" path="/etc/xen" recovery="relocate"/> <vm autostart="1" exclusive="0" migrate="live" name="alfrescoS64" path="/etc/xen" recovery="relocate"/>
       </rm>
<quorumd interval="3" label="quorum64" min_score="1" tko="30" votes="1"> <heuristic interval="2" program="ping -c3 -t2 X.X.X.X" score="1"/>
       </quorumd>
</cluster>

part of lvm.conf:
# Type 3 uses built-in clustered locking.
   locking_type = 3

   # If using external locking (type 2) and initialisation fails,
   # with this set to 1 an attempt will be made to use the built-in
   # clustered locking.
# If you are using a customised locking_library you should set this to 0.
   fallback_to_clustered_locking = 0

   # If an attempt to initialise type 2 or type 3 locking failed, perhaps
# because cluster components such as clvmd are not running, with this set
   # to 1 an attempt will be made to use local file-based locking (type 1).
# If this succeeds, only commands against local volume groups will proceed.
   # Volume Groups marked as clustered will be ignored.
   fallback_to_local_locking = 1

   # Local non-LV directory that holds file-based locks while commands are
# in progress. A directory like /tmp that may get wiped on reboot is OK.
   locking_dir = "/var/lock/lvm"

   # Other entries can go here to allow you to load shared libraries
   # e.g. if support for LVM1 metadata was compiled as a shared library use
   #   format_libraries = "liblvm2format1.so"
   # Full pathnames can be given.

   # Search this directory first for shared libraries.
   #   library_dir = "/lib"

   # The external locking library to load if locking_type is set to 2.
   #   locking_library = "liblvm2clusterlock.so"


part of lvm log on second node :

vgchange.c:165   Activated logical volumes in volume group "VolGroup00"
vgchange.c:172   7 logical volume(s) in volume group "VolGroup00" now active
cache/lvmcache.c:1220   Wiping internal VG cache
commands/toolcontext.c:188   Logging initialised at Wed Jun  3 15:17:29 2009
commands/toolcontext.c:209   Set umask to 0077
locking/cluster_locking.c:83 connect() failed on local socket: Connexion refusée
locking/locking.c:259   WARNING: Falling back to local file-based locking.
locking/locking.c:261 Volume Groups with the clustered attribute will be inaccessible.
toollib.c:578   Finding all volume groups
toollib.c:491   Finding volume group "VGhomealfrescoS64"
metadata/metadata.c:2379   Skipping clustered volume group VGhomealfrescoS64
toollib.c:491   Finding volume group "VGhomealfS64"
metadata/metadata.c:2379   Skipping clustered volume group VGhomealfS64
toollib.c:491   Finding volume group "VGvmalfrescoS64"
metadata/metadata.c:2379   Skipping clustered volume group VGvmalfrescoS64
toollib.c:491   Finding volume group "VGvmalfrescoI64"
metadata/metadata.c:2379   Skipping clustered volume group VGvmalfrescoI64
toollib.c:491   Finding volume group "VGvmalfrescoP64"
metadata/metadata.c:2379   Skipping clustered volume group VGvmalfrescoP64
toollib.c:491   Finding volume group "VolGroup00"
libdm-report.c:981   VolGroup00
cache/lvmcache.c:1220   Wiping internal VG cache
commands/toolcontext.c:188   Logging initialised at Wed Jun  3 15:17:29 2009
commands/toolcontext.c:209   Set umask to 0077
locking/cluster_locking.c:83 connect() failed on local socket: Connexion refusée
locking/locking.c:259   WARNING: Falling back to local file-based locking.
locking/locking.c:261 Volume Groups with the clustered attribute will be inaccessible.
toollib.c:542   Using volume group(s) on command line
toollib.c:491   Finding volume group "VolGroup00"
vgchange.c:117   7 logical volume(s) in volume group "VolGroup00" monitored
cache/lvmcache.c:1220   Wiping internal VG cache
commands/toolcontext.c:188   Logging initialised at Wed Jun  3 15:20:45 2009
commands/toolcontext.c:209   Set umask to 0077
toollib.c:331   Finding all logical volumes
commands/toolcontext.c:188   Logging initialised at Wed Jun  3 15:20:50 2009
commands/toolcontext.c:209   Set umask to 0077
toollib.c:578   Finding all volume groups


group_tool on node 1
type level name id state fence 0 default 00010001 none [1 2] dlm 1 clvmd 00010002 none [1 2] dlm 1 rgmanager 00020002 none [1]


group_tool on node 2
[root@remus ~]# group_tool
type level name id state fence 0 default 00010001 none [1 2] dlm 1 clvmd 00010002 none [1 2]

Additional info:

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux