Description of problem: In a 2 nodes cluster, after 1 node is fence, any
clvm command hang on the ramaining node. when the fenced node cluster
come back in the cluster, any clvm command also hang, moreover the node
do not activate any clustered vg, and so do not access any shared device.
Version-Release number of selected component (if applicable):
redhat 5.2
update device-mapper-1.02.28-2.el5.x86_64.rpm
lvm2-2.02.40-6.el5.x86_64.rpm
lvm2-cluster-2.02.40-7.el5.x86_64.rpm
Steps to Reproduce:
1.2 nodes cluster , quorum formed with qdisk
2.cold boot node 2
3.node 2 is evicted and fenced, service are taken over by node 1
4.node é come back in cluster, quorate, but no clustered vg are up and
any lvm related command hang
5.At this step every lvm command hang on node 1
Expected results: node 2 should be able to get back the lock on
clustered lvm volume and node 1 should be able to issue any lvm relate
command
Here are my cluster.conf and lvm.conf
<?xml version="1.0"?>
<cluster alias="rome" config_version="53" name="rome">
<fence_daemon clean_start="0" post_fail_delay="9"
post_join_delay="6"/>
<clusternodes>
<clusternode name="romulus.fr" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ilo172"/>
</method>
</fence>
</clusternode>
<clusternode name="remus.fr" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ilo173"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="3"/>
<totem consensus="4800" join="60" token="21002"
token_retransmits_before_loss_const="20"/>
<fencedevices>
<fencedevice agent="fence_ilo" hostname="X.X.X.X"
login="Administrator" name="ilo172" passwd="X.X.X.X"/>
<fencedevice agent="fence_ilo" hostname="XXXX"
login="Administrator" name="ilo173" passwd="XXXX"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
<vm autostart="1" exclusive="0" migrate="live"
name="alfrescoP64" path="/etc/xen" recovery="relocate"/>
<vm autostart="1" exclusive="0" migrate="live"
name="alfrescoI64" path="/etc/xen" recovery="relocate"/>
<vm autostart="1" exclusive="0" migrate="live"
name="alfrescoS64" path="/etc/xen" recovery="relocate"/>
</rm>
<quorumd interval="3" label="quorum64" min_score="1" tko="30"
votes="1">
<heuristic interval="2" program="ping -c3 -t2 X.X.X.X"
score="1"/>
</quorumd>
</cluster>
part of lvm.conf:
# Type 3 uses built-in clustered locking.
locking_type = 3
# If using external locking (type 2) and initialisation fails,
# with this set to 1 an attempt will be made to use the built-in
# clustered locking.
# If you are using a customised locking_library you should set this
to 0.
fallback_to_clustered_locking = 0
# If an attempt to initialise type 2 or type 3 locking failed, perhaps
# because cluster components such as clvmd are not running, with
this set
# to 1 an attempt will be made to use local file-based locking (type 1).
# If this succeeds, only commands against local volume groups will
proceed.
# Volume Groups marked as clustered will be ignored.
fallback_to_local_locking = 1
# Local non-LV directory that holds file-based locks while commands are
# in progress. A directory like /tmp that may get wiped on reboot
is OK.
locking_dir = "/var/lock/lvm"
# Other entries can go here to allow you to load shared libraries
# e.g. if support for LVM1 metadata was compiled as a shared library use
# format_libraries = "liblvm2format1.so"
# Full pathnames can be given.
# Search this directory first for shared libraries.
# library_dir = "/lib"
# The external locking library to load if locking_type is set to 2.
# locking_library = "liblvm2clusterlock.so"
part of lvm log on second node :
vgchange.c:165 Activated logical volumes in volume group "VolGroup00"
vgchange.c:172 7 logical volume(s) in volume group "VolGroup00" now active
cache/lvmcache.c:1220 Wiping internal VG cache
commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:17:29 2009
commands/toolcontext.c:209 Set umask to 0077
locking/cluster_locking.c:83 connect() failed on local socket:
Connexion refusée
locking/locking.c:259 WARNING: Falling back to local file-based locking.
locking/locking.c:261 Volume Groups with the clustered attribute will
be inaccessible.
toollib.c:578 Finding all volume groups
toollib.c:491 Finding volume group "VGhomealfrescoS64"
metadata/metadata.c:2379 Skipping clustered volume group VGhomealfrescoS64
toollib.c:491 Finding volume group "VGhomealfS64"
metadata/metadata.c:2379 Skipping clustered volume group VGhomealfS64
toollib.c:491 Finding volume group "VGvmalfrescoS64"
metadata/metadata.c:2379 Skipping clustered volume group VGvmalfrescoS64
toollib.c:491 Finding volume group "VGvmalfrescoI64"
metadata/metadata.c:2379 Skipping clustered volume group VGvmalfrescoI64
toollib.c:491 Finding volume group "VGvmalfrescoP64"
metadata/metadata.c:2379 Skipping clustered volume group VGvmalfrescoP64
toollib.c:491 Finding volume group "VolGroup00"
libdm-report.c:981 VolGroup00
cache/lvmcache.c:1220 Wiping internal VG cache
commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:17:29 2009
commands/toolcontext.c:209 Set umask to 0077
locking/cluster_locking.c:83 connect() failed on local socket:
Connexion refusée
locking/locking.c:259 WARNING: Falling back to local file-based locking.
locking/locking.c:261 Volume Groups with the clustered attribute will
be inaccessible.
toollib.c:542 Using volume group(s) on command line
toollib.c:491 Finding volume group "VolGroup00"
vgchange.c:117 7 logical volume(s) in volume group "VolGroup00" monitored
cache/lvmcache.c:1220 Wiping internal VG cache
commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:20:45 2009
commands/toolcontext.c:209 Set umask to 0077
toollib.c:331 Finding all logical volumes
commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:20:50 2009
commands/toolcontext.c:209 Set umask to 0077
toollib.c:578 Finding all volume groups
group_tool on node 1
type level name id state
fence 0 default 00010001 none
[1 2]
dlm 1 clvmd 00010002 none
[1 2]
dlm 1 rgmanager 00020002 none
[1]
group_tool on node 2
[root@remus ~]# group_tool
type level name id state
fence 0 default 00010001 none
[1 2]
dlm 1 clvmd 00010002 none
[1 2]
Additional info:
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster