Hi list! I have a question about fence_xvm. Situation is: one physical server with xen --> dom0 with 2 domU. Cluster work fine between domU --reboot, relocate, I'm using redhat 5.5 Problem is with fence from dom0 with "fence_xvm -H oelcl2" , domU is destroyed but when it is booted back domU can't join to the cluster. domU boot very long time --> FENCED_START_TIMEOUT=300 on console I get after the node2 is up: node2: INFO: task clurgmgrd:2127 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. clurgmgrd D 0000000000000010 0 2127 2126 (NOTLB) ffff88006f08dda8 0000000000000286 ffff88007cc0b810 0000000000000000 0000000000000003 ffff880072009860 ffff880072f6b0c0 00000000000455ec ffff880072009a48 ffffffff802649d7 Call Trace: [<ffffffff802649d7>] _read_lock_irq+0x9/0x19 [<ffffffff8021420e>] filemap_nopage+0x193/0x360 [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14 [<ffffffff88424b64>] :dlm:dlm_new_lockspace+0x2c/0x860 [<ffffffff80222b08>] __up_read+0x19/0x7f [<ffffffff802d0abb>] __kmalloc+0x8f/0x9f [<ffffffff8842b6fa>] :dlm:device_write+0x438/0x5e5 [<ffffffff80217377>] vfs_write+0xce/0x174 [<ffffffff80217bc4>] sys_write+0x45/0x6e [<ffffffff802602f9>] tracesys+0xab/0xb6 between booting on node2: Starting clvmd: dlm: Using TCP for communications clvmd startup timed out [FAILED] node2: [root@oelcl2 init.d]# clustat Cluster Status for cluster1 @ Sat Sep 11 18:11:21 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ oelcl1 1 Online oelcl2 2 Online, Local [root@oelcl2 init.d]# on first node: [root@oelcl1 ~]# clustat Cluster Status for cluster1 @ Sat Sep 11 18:12:07 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ oelcl1 1 Online, Local, rgmanager oelcl2 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:webby oelcl1 started [root@oelcl1 ~]# and then I have to destroy both domU on guest and create it back to get node2 work again. I have use how to on https://access.redhat.com/kb/docs/DOC-5937 and http://sources.redhat.com/cluster/wiki/VMClusterCookbook cluster config on dom0 <?xml version="1.0"?> <cluster alias="vmcluster" config_version="1" name="vmcluster"> <clusternodes> <clusternode name="vm5" nodeid="1" votes="1"/> </clusternodes> <cman/> <fencedevices/> <rm/> <fence_xvmd/> </cluster> cluster config on domU <?xml version="1.0"?> <cluster alias="cluster1" config_version="49" name="cluster1"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="4"/> <clusternodes> <clusternode name="oelcl1.name.comi" nodeid="1" votes="1"> <fence> <method name="1"> <device domain="oelcl1" name="xenfence1"/> </method> </fence> </clusternode> <clusternode name="oelcl2.name.com" nodeid="2" votes="1"> <fence> <method name="1"> <device domain="oelcl2" name="xenfence1"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_xvm" name="xenfence1"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="prefer_node1" nofailback="0" ordered="1" restricted="1"> <failoverdomainnode name="oelcl1.name.com" priority="1"/> <failoverdomainnode name="oelcl2.name.com" priority="2"/> </failoverdomain> </failoverdomains> <resources> <ip address="xx.xx.xx.xx" monitor_link="1"/> <fs device="/dev/xvdb1" force_fsck="0" force_unmount="0" fsid="8669" fstype="ext3" mountpoint="/var/www/html" name="docroot" self_fence="0"/> <script file="/etc/init.d/httpd" name="apache_s"/> </resources> <service autostart="1" domain="prefer_node1" exclusive="0" name="webby" recovery="relocate"> <ip ref="xx.xx.xx.xx"/> <fs ref="docroot"/> <script ref="apache_s"/> </service> </rm> </cluster> fence proces on dom0 [root@vm5 cluster]# ps -ef |grep fenc root 18690 1 0 17:40 ? 00:00:00 /sbin/fenced root 18720 1 0 17:40 ? 00:00:00 /sbin/fence_xvmd -I xenbr0 root 22633 14524 0 18:21 pts/3 00:00:00 grep fenc [root@vm5 cluster]# and on domU [root@oelcl1 ~]# ps -ef|grep fen root 1523 1 0 17:41 ? 00:00:00 /sbin/fenced root 13695 2902 0 18:22 pts/0 00:00:00 grep fen [root@oelcl1 ~]# Do somebody have any idea why fence don't work? thx br jost -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster