Hi Joel, Hi Joel, On Fri, 2010-10-01 at 15:09 +1000, Joel Heenan wrote: > Are you saying that if you manually destroy the guest, then start it > up it works? No. I have to destroy both node. > > I don't think your problem is with fencing I think its that the two > guests are not joining correctly. It seems like the fencing part is > working. > > Do the logs in /var/log/messages show that one node succesfully fenced > the other? What is the output of group_tool on both nodes after they > have come up, this should help you debug it. > yes Oct 1 11:04:39 clu5 fenced[1541]: fence "clu6.snt.si" success node1 [root@clu5 ~]# group_tool type level name id state fence 0 default 00010001 JOIN_STOP_WAIT [1 2 2] dlm 1 clvmd 00020001 JOIN_STOP_WAIT [1 2 2] dlm 1 rgmanager 00010002 none [1 2] [root@clu5 ~]# [root@clu5 ~]# [root@clu5 ~]# group_tool dump fence 1285924843 our_nodeid 1 our_name clu5.snt.si 1285924843 listen 4 member 5 groupd 7 1285924846 client 3: join default 1285924846 delay post_join 3s post_fail 0s 1285924846 added 2 nodes from ccs 1285924846 setid default 65537 1285924846 start default 1 members 1 1285924846 do_recovery stop 0 start 1 finish 0 1285924846 finish default 1 1285924846 stop default 1285924846 start default 2 members 2 1 1285924846 do_recovery stop 1 start 2 finish 1 1285924846 finish default 2 1285924936 stop default 1285924985 client 3: dump 1285925065 client 3: dump 1285925281 client 3: dump [root@clu5 ~]# node2 [root@clu6 ~]# group_tool type level name id state fence 0 default 00000000 JOIN_STOP_WAIT [1 2] dlm 1 clvmd 00000000 JOIN_STOP_WAIT [1 2] [root@clu6 ~]# [root@clu6 ~]# [root@clu6 ~]# group_tool dump fence 1285924935 our_nodeid 2 our_name clu6.snt.si 1285924935 listen 4 member 5 groupd 7 1285924936 client 3: join default 1285924936 delay post_join 3s post_fail 0s 1285924936 added 2 nodes from ccs 1285925291 client 3: dump [root@clu6 ~]# thx br jost ________________________________________ From: linux-cluster-bounces@xxxxxxxxxx [linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Joel Heenan [joelh@xxxxxxxxxxxxxx] Sent: Friday, October 01, 2010 7:09 AM To: linux clustering Subject: Re: fence in xen Are you saying that if you manually destroy the guest, then start it up it works? I don't think your problem is with fencing I think its that the two guests are not joining correctly. It seems like the fencing part is working. Do the logs in /var/log/messages show that one node succesfully fenced the other? What is the output of group_tool on both nodes after they have come up, this should help you debug it. I don't think its relevant but this item from the FAQ may help: http://sources.redhat.com/cluster/wiki/FAQ/Fencing#fence_stuck Joel On Wed, Sep 22, 2010 at 7:08 PM, Rakovec Jost <Jost.Rakovec@xxxxxx<mailto:Jost.Rakovec@xxxxxx>> wrote: Hi anybody any idea? Please help!! now i can fence node but after booting it can't connect in to cluster. on dom0 fence_xvmd -LX -I xenbr0 -U xen:/// -fdddddddddddddd ipv4_connect: Connecting to client ipv4_connect: Success; fd = 12 Rebooting domain oelcl21... [REBOOT] Calling virDomainDestroy(0x99cede0) libvir: Xen error : Domain not found: xenUnifiedDomainLookupByName [[ XML Domain Info ]] <domain type='xen' id='41'> <name>oelcl21</name> <uuid>07e31b27-1ff1-4754-4f58-221e8d2057d6</uuid> <memory>1048576</memory> <currentMemory>1048576</currentMemory> <vcpu>2</vcpu> <bootloader>/usr/bin/pygrub</bootloader> <os> <type>linux</type> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <disk type='block' device='disk'> <driver name='phy'/> <source dev='/dev/vg_datastore/oelcl21'/> <target dev='xvda' bus='xen'/> </disk> <disk type='block' device='disk'> <driver name='phy'/> <source dev='/dev/vg_datastore/skupni1'/> <target dev='xvdb' bus='xen'/> <shareable/> </disk> <interface type='bridge'> <mac address='00:16:3e:7c:60:aa'/> <source bridge='xenbr0'/> <script path='/etc/xen/scripts/vif-bridge'/> <target dev='vif41.0'/> </interface> <console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target port='0'/> </console> </devices> </domain> [[ XML END ]] Calling virDomainCreateLinux().. on domU -node1 fence_xvm -H oelcl21 -ddd clustat on node1: [root@oelcl11 ~]# clustat Cluster Status for cluster2 @ Wed Sep 22 11:04:49 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ oelcl11 1 Online, Local, rgmanager oelcl21 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:web oelcl11 started [root@oelcl11 ~]# but node2 it waits for 300s an can 't connect Starting daemons... done Starting fencing... Sep 22 10:41:06 oelcl21 kernel: eth0: no IPv6 routers present done [ OK ] [root@oelcl21 ~]# clustat Cluster Status for cluster2 @ Wed Sep 22 11:04:19 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ oelcl11 1 Online oelcl21 2 Online, Local [root@oelcl21 ~]# br jost ________________________________________ From: linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx> [linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx>] On Behalf Of Rakovec Jost [Jost.Rakovec@xxxxxx<mailto:Jost.Rakovec@xxxxxx>] Sent: Monday, September 13, 2010 9:31 AM To: linux clustering Subject: Re: fence in xen Hi Q: do fence_xvmd must run also in domU? Because I notice that if I run on host when fence_xvmd is running: [root@oelcl1 ~]# fence_xvm -H oelcl2 -ddd -o null Debugging threshold is now 3 -- args @ 0x7fffe3f71fb0 -- args->addr = 225.0.0.12 args->domain = oelcl2 args->key_file = /etc/cluster/fence_xvm.key args->op = 0 args->hash = 2 args->auth = 2 args->port = 1229 args->ifindex = 0 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 0 args->debug = 3 -- end args -- Reading in key file /etc/cluster/fence_xvm.key into 0x7fffe3f70f60 (4096 max size) Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1 Sending to 225.0.0.12 via 10.9.131.80 Sending to 225.0.0.12 via 10.9.131.83 Sending to 225.0.0.12 via 192.168.122.1 Waiting for connection from XVM host daemon. Issuing TCP challenge Responding to TCP challenge TCP Exchange + Authentication done... Waiting for return value from XVM host Remote: Operation was successful but if I try to fence ---> reboot then I get: [root@oelcl1 ~]# fence_xvm -H oelc2 Remote: Operation was successful [root@oelcl1 ~]# but host2 is not reboot. if fence_xvmd is not run on hosts then I get time out. [root@oelcl1 sysconfig]# fence_xvm -H oelcl2 -ddd -o null Debugging threshold is now 3 -- args @ 0x7fff1a6b5580 -- args->addr = 225.0.0.12 args->domain = oelcl2 args->key_file = /etc/cluster/fence_xvm.key args->op = 0 args->hash = 2 args->auth = 2 args->port = 1229 args->ifindex = 0 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 0 args->debug = 3 -- end args -- Reading in key file /etc/cluster/fence_xvm.key into 0x7fff1a6b4530 (4096 max size) Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1 Sending to 225.0.0.12 via 10.9.131.80 Waiting for connection from XVM host daemon. Sending to 225.0.0.12 via 127.0.0.1 Sending to 225.0.0.12 via 10.9.131.80 Waiting for connection from XVM host daemon. Q: how can I try if multicast is ok? Q: on which network interface must fence_xvmd run on dom0? I notice that on hosts-domU is: virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:40 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:7212 (7.0 KiB) also virbr0 and on dom0 guest: [root@vm5 ~]# fence_xvmd -fdd -I xenbr0 -- args @ 0xbfd26234 -- args->addr = 225.0.0.12 args->domain = (null) args->key_file = /etc/cluster/fence_xvm.key args->op = 2 args->hash = 2 args->auth = 2 args->port = 1229 args->ifindex = 7 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 1 args->debug = 2 -- end args -- Opened ckpt vm_states My Node ID = 1 Domain UUID Owner State ------ ---- ----- ----- Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 oelcl1 2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002 oelcl2 dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002 oelcman 09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002 Storing oelcl1 Storing oelcl2 [root@vm5 ~]# fence_xvmd -fdd -I virbr0 -- args @ 0xbfd26234 -- args->addr = 225.0.0.12 args->domain = (null) args->key_file = /etc/cluster/fence_xvm.key args->op = 2 args->hash = 2 args->auth = 2 args->port = 1229 args->ifindex = 7 args->family = 2 args->timeout = 30 args->retr_time = 20 args->flags = 1 args->debug = 2 -- end args -- Opened ckpt vm_states My Node ID = 1 Domain UUID Owner State ------ ---- ----- ----- Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 oelcl1 2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002 oelcl2 dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002 oelcman 09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002 Storing oelcl1 Storing oelcl2 no meter whic interface I take fence is not done. thx br jost _____________________________________ From: linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx> [linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx>] On Behalf Of Rakovec Jost [Jost.Rakovec@xxxxxx<mailto:Jost.Rakovec@xxxxxx>] Sent: Saturday, September 11, 2010 6:36 PM To: linux-cluster@xxxxxxxxxx<mailto:linux-cluster@xxxxxxxxxx> Subject: fence in xen Hi list! I have a question about fence_xvm. Situation is: one physical server with xen --> dom0 with 2 domU. Cluster work fine between domU --reboot, relocate, I'm using redhat 5.5 Problem is with fence from dom0 with "fence_xvm -H oelcl2" , domU is destroyed but when it is booted back domU can't join to the cluster. domU boot very long time --> FENCED_START_TIMEOUT=300 on console I get after the node2 is up: node2: INFO: task clurgmgrd:2127 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. clurgmgrd D 0000000000000010 0 2127 2126 (NOTLB) ffff88006f08dda8 0000000000000286 ffff88007cc0b810 0000000000000000 0000000000000003 ffff880072009860 ffff880072f6b0c0 00000000000455ec ffff880072009a48 ffffffff802649d7 Call Trace: [<ffffffff802649d7>] _read_lock_irq+0x9/0x19 [<ffffffff8021420e>] filemap_nopage+0x193/0x360 [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14 [<ffffffff88424b64>] :dlm:dlm_new_lockspace+0x2c/0x860 [<ffffffff80222b08>] __up_read+0x19/0x7f [<ffffffff802d0abb>] __kmalloc+0x8f/0x9f [<ffffffff8842b6fa>] :dlm:device_write+0x438/0x5e5 [<ffffffff80217377>] vfs_write+0xce/0x174 [<ffffffff80217bc4>] sys_write+0x45/0x6e [<ffffffff802602f9>] tracesys+0xab/0xb6 between booting on node2: Starting clvmd: dlm: Using TCP for communications clvmd startup timed out [FAILED] node2: [root@oelcl2 init.d]# clustat Cluster Status for cluster1 @ Sat Sep 11 18:11:21 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ oelcl1 1 Online oelcl2 2 Online, Local [root@oelcl2 init.d]# on first node: [root@oelcl1 ~]# clustat Cluster Status for cluster1 @ Sat Sep 11 18:12:07 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ oelcl1 1 Online, Local, rgmanager oelcl2 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:webby oelcl1 started [root@oelcl1 ~]# and then I have to destroy both domU on guest and create it back to get node2 work again. I have use how to on https://access.redhat.com/kb/docs/DOC-5937 and http://sources.redhat.com/cluster/wiki/VMClusterCookbook cluster config on dom0 <?xml version="1.0"?> <cluster alias="vmcluster" config_version="1" name="vmcluster"> <clusternodes> <clusternode name="vm5" nodeid="1" votes="1"/> </clusternodes> <cman/> <fencedevices/> <rm/> <fence_xvmd/> </cluster> cluster config on domU <?xml version="1.0"?> <cluster alias="cluster1" config_version="49" name="cluster1"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="4"/> <clusternodes> <clusternode name="oelcl1.name.comi" nodeid="1" votes="1"> <fence> <method name="1"> <device domain="oelcl1" name="xenfence1"/> </method> </fence> </clusternode> <clusternode name="oelcl2.name.com<http://oelcl2.name.com>" nodeid="2" votes="1"> <fence> <method name="1"> <device domain="oelcl2" name="xenfence1"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_xvm" name="xenfence1"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="prefer_node1" nofailback="0" ordered="1" restricted="1"> <failoverdomainnode name="oelcl1.name.com<http://oelcl1.name.com>" priority="1"/> <failoverdomainnode name="oelcl2.name.com<http://oelcl2.name.com>" priority="2"/> </failoverdomain> </failoverdomains> <resources> <ip address="xx.xx.xx.xx" monitor_link="1"/> <fs device="/dev/xvdb1" force_fsck="0" force_unmount="0" fsid="8669" fstype="ext3" mountpoint="/var/www/html" name="docroot" self_fence="0"/> <script file="/etc/init.d/httpd" name="apache_s"/> </resources> <service autostart="1" domain="prefer_node1" exclusive="0" name="webby" recovery="relocate"> <ip ref="xx.xx.xx.xx"/> <fs ref="docroot"/> <script ref="apache_s"/> </service> </rm> </cluster> fence proces on dom0 [root@vm5 cluster]# ps -ef |grep fenc root 18690 1 0 17:40 ? 00:00:00 /sbin/fenced root 18720 1 0 17:40 ? 00:00:00 /sbin/fence_xvmd -I xenbr0 root 22633 14524 0 18:21 pts/3 00:00:00 grep fenc [root@vm5 cluster]# and on domU [root@oelcl1 ~]# ps -ef|grep fen root 1523 1 0 17:41 ? 00:00:00 /sbin/fenced root 13695 2902 0 18:22 pts/0 00:00:00 grep fen [root@oelcl1 ~]# Do somebody have any idea why fence don't work? thx br jost -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx> https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx> https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx> https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster