"""
[root@clu5 ~]# group_tool
type level name id state
fence 0 default 00010001 JOIN_STOP_WAIT
[1 2 2]
dlm 1 clvmd 00020001 JOIN_STOP_WAIT
[1 2 2]
dlm 1 rgmanager 00010002 none
[1 2]
""
To my understanding this means that fence and dlm for clvm both see two copies of node 2. You'll have to check how this is happening, did cman start twice? Did you manually stop and it and start it?
Try disabling your firewall and get both nodes up in a stable state. The state should all be "none". Once that is complete, then look at trying to do fencing.
Joel
On Fri, Oct 1, 2010 at 11:42 PM, Rakovec Jost <Jost.Rakovec@xxxxxx> wrote:
Hi Joel,
Hi Joel,
No. I have to destroy both node.
On Fri, 2010-10-01 at 15:09 +1000, Joel Heenan wrote:
> Are you saying that if you manually destroy the guest, then start it
> up it works?
yes
>
> I don't think your problem is with fencing I think its that the two
> guests are not joining correctly. It seems like the fencing part is
> working.
>
> Do the logs in /var/log/messages show that one node succesfully fenced
> the other? What is the output of group_tool on both nodes after they
> have come up, this should help you debug it.
>
Oct 1 11:04:39 clu5 fenced[1541]: fence "clu6.snt.si" success
node1
[root@clu5 ~]# group_tool
type level name id state
fence 0 default 00010001 JOIN_STOP_WAIT
[1 2 2]
dlm 1 clvmd 00020001 JOIN_STOP_WAIT
[1 2 2]
dlm 1 rgmanager 00010002 none
[1 2]
[root@clu5 ~]#
[root@clu5 ~]#
[root@clu5 ~]# group_tool dump fence
1285924843 our_nodeid 1 our_name clu5.snt.si
1285924843 listen 4 member 5 groupd 7
1285924846 client 3: join default
1285924846 delay post_join 3s post_fail 0s
1285924846 added 2 nodes from ccs
1285924846 setid default 65537
1285924846 start default 1 members 1
1285924846 do_recovery stop 0 start 1 finish 0
1285924846 finish default 1
1285924846 stop default
1285924846 start default 2 members 2 1
1285924846 do_recovery stop 1 start 2 finish 1
1285924846 finish default 2
1285924936 stop default
1285924985 client 3: dump
1285925065 client 3: dump
1285925281 client 3: dump
[root@clu5 ~]#
node2
[root@clu6 ~]# group_tool
type level name id state
fence 0 default 00000000 JOIN_STOP_WAIT
[1 2]
dlm 1 clvmd 00000000 JOIN_STOP_WAIT
[1 2]
[root@clu6 ~]#
[root@clu6 ~]#
[root@clu6 ~]# group_tool dump fence
1285924935 our_nodeid 2 our_name clu6.snt.si
1285924935 listen 4 member 5 groupd 7
1285924936 client 3: join default
1285924936 delay post_join 3s post_fail 0s
1285924936 added 2 nodes from ccs
1285925291 client 3: dump
[root@clu6 ~]#
thx
br jost
________________________________________
From: linux-cluster-bounces@xxxxxxxxxx [linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Joel Heenan [joelh@xxxxxxxxxxxxxx]
Sent: Friday, October 01, 2010 7:09 AM
To: linux clustering
Subject: Re: fence in xen
Are you saying that if you manually destroy the guest, then start it up it works?
I don't think your problem is with fencing I think its that the two guests are not joining correctly. It seems like the fencing part is working.
Do the logs in /var/log/messages show that one node succesfully fenced the other? What is the output of group_tool on both nodes after they have come up, this should help you debug it.
I don't think its relevant but this item from the FAQ may help:
http://sources.redhat.com/cluster/wiki/FAQ/Fencing#fence_stuck
Joel
From: linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx> [linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx>] On Behalf Of Rakovec Jost [Jost.Rakovec@xxxxxx<mailto:Jost.Rakovec@xxxxxx>]On Wed, Sep 22, 2010 at 7:08 PM, Rakovec Jost <Jost.Rakovec@xxxxxx<mailto:Jost.Rakovec@xxxxxx>> wrote:
Hi
anybody any idea? Please help!!
now i can fence node but after booting it can't connect in to cluster.
on dom0
fence_xvmd -LX -I xenbr0 -U xen:/// -fdddddddddddddd
ipv4_connect: Connecting to client
ipv4_connect: Success; fd = 12
Rebooting domain oelcl21...
[REBOOT] Calling virDomainDestroy(0x99cede0)
libvir: Xen error : Domain not found: xenUnifiedDomainLookupByName
[[ XML Domain Info ]]
<domain type='xen' id='41'>
<name>oelcl21</name>
<uuid>07e31b27-1ff1-4754-4f58-221e8d2057d6</uuid>
<memory>1048576</memory>
<currentMemory>1048576</currentMemory>
<vcpu>2</vcpu>
<bootloader>/usr/bin/pygrub</bootloader>
<os>
<type>linux</type>
</os>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<disk type='block' device='disk'>
<driver name='phy'/>
<source dev='/dev/vg_datastore/oelcl21'/>
<target dev='xvda' bus='xen'/>
</disk>
<disk type='block' device='disk'>
<driver name='phy'/>
<source dev='/dev/vg_datastore/skupni1'/>
<target dev='xvdb' bus='xen'/>
<shareable/>
</disk>
<interface type='bridge'>
<mac address='00:16:3e:7c:60:aa'/>
<source bridge='xenbr0'/>
<script path='/etc/xen/scripts/vif-bridge'/>
<target dev='vif41.0'/>
</interface>
<console type='pty' tty='/dev/pts/2'>
<source path='/dev/pts/2'/>
<target port='0'/>
</console>
</devices>
</domain>
[[ XML END ]]
Calling virDomainCreateLinux()..
on domU -node1
fence_xvm -H oelcl21 -ddd
clustat on node1:
[root@oelcl11 ~]# clustat
Cluster Status for cluster2 @ Wed Sep 22 11:04:49 2010
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
oelcl11 1 Online, Local, rgmanager
oelcl21 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:web oelcl11 started
[root@oelcl11 ~]#
but node2 it waits for 300s an can 't connect
Starting daemons... done
Starting fencing... Sep 22 10:41:06 oelcl21 kernel: eth0: no IPv6 routers present
done
[ OK ]
[root@oelcl21 ~]# clustat
Cluster Status for cluster2 @ Wed Sep 22 11:04:19 2010
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
oelcl11 1 Online
oelcl21 2 Online, Local
[root@oelcl21 ~]#
br
jost
________________________________________
From: linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx> [linux-cluster-bounces@xxxxxxxxxx<mailto:linux-cluster-bounces@xxxxxxxxxx>] On Behalf Of Rakovec Jost [Jost.Rakovec@xxxxxx<mailto:Jost.Rakovec@xxxxxx>]Sent: Monday, September 13, 2010 9:31 AM
To: linux clustering
Subject: Re: fence in xen
Hi
Q: do fence_xvmd must run also in domU?
Because I notice that if I run on host when fence_xvmd is running:
[root@oelcl1 ~]# fence_xvm -H oelcl2 -ddd -o null
Debugging threshold is now 3
-- args @ 0x7fffe3f71fb0 --
args->addr = 225.0.0.12
args->domain = oelcl2
args->key_file = /etc/cluster/fence_xvm.key
args->op = 0
args->hash = 2
args->auth = 2
args->port = 1229
args->ifindex = 0
args->family = 2
args->timeout = 30
args->retr_time = 20
args->flags = 0
args->debug = 3
-- end args --
Reading in key file /etc/cluster/fence_xvm.key into 0x7fffe3f70f60 (4096 max size)
Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1
Sending to 225.0.0.12 via 10.9.131.80
Sending to 225.0.0.12 via 10.9.131.83
Sending to 225.0.0.12 via 192.168.122.1
Waiting for connection from XVM host daemon.
Issuing TCP challenge
Responding to TCP challenge
TCP Exchange + Authentication done...
Waiting for return value from XVM host
Remote: Operation was successful
but if I try to fence ---> reboot then I get:
[root@oelcl1 ~]# fence_xvm -H oelc2
Remote: Operation was successful
[root@oelcl1 ~]#
but host2 is not reboot.
if fence_xvmd is not run on hosts then I get time out.
[root@oelcl1 sysconfig]# fence_xvm -H oelcl2 -ddd -o null
Debugging threshold is now 3
-- args @ 0x7fff1a6b5580 --
args->addr = 225.0.0.12
args->domain = oelcl2
args->key_file = /etc/cluster/fence_xvm.key
args->op = 0
args->hash = 2
args->auth = 2
args->port = 1229
args->ifindex = 0
args->family = 2
args->timeout = 30
args->retr_time = 20
args->flags = 0
args->debug = 3
-- end args --
Reading in key file /etc/cluster/fence_xvm.key into 0x7fff1a6b4530 (4096 max size)
Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1
Sending to 225.0.0.12 via 10.9.131.80
Waiting for connection from XVM host daemon.
Sending to 225.0.0.12 via 127.0.0.1
Sending to 225.0.0.12 via 10.9.131.80
Waiting for connection from XVM host daemon.
Q: how can I try if multicast is ok?
Q: on which network interface must fence_xvmd run on dom0? I notice that on hosts-domU is:
virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:40 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:7212 (7.0 KiB)
also virbr0
and on dom0 guest:
[root@vm5 ~]# fence_xvmd -fdd -I xenbr0
-- args @ 0xbfd26234 --
args->addr = 225.0.0.12
args->domain = (null)
args->key_file = /etc/cluster/fence_xvm.key
args->op = 2
args->hash = 2
args->auth = 2
args->port = 1229
args->ifindex = 7
args->family = 2
args->timeout = 30
args->retr_time = 20
args->flags = 1
args->debug = 2
-- end args --
Opened ckpt vm_states
My Node ID = 1
Domain UUID Owner State
------ ---- ----- -----
Domain-0 00000000-0000-0000-0000-000000000000 00001 00001
oelcl1 2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002
oelcl2 dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002
oelcman 09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002
Storing oelcl1
Storing oelcl2
[root@vm5 ~]# fence_xvmd -fdd -I virbr0
-- args @ 0xbfd26234 --
args->addr = 225.0.0.12
args->domain = (null)
args->key_file = /etc/cluster/fence_xvm.key
args->op = 2
args->hash = 2
args->auth = 2
args->port = 1229
args->ifindex = 7
args->family = 2
args->timeout = 30
args->retr_time = 20
args->flags = 1
args->debug = 2
-- end args --
Opened ckpt vm_states
My Node ID = 1
Domain UUID Owner State
------ ---- ----- -----
Domain-0 00000000-0000-0000-0000-000000000000 00001 00001
oelcl1 2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002
oelcl2 dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002
oelcman 09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002
Storing oelcl1
Storing oelcl2
no meter whic interface I take fence is not done.
thx
br jost
_____________________________________
Sent: Saturday, September 11, 2010 6:36 PMTo: linux-cluster@xxxxxxxxxx<mailto:linux-cluster@xxxxxxxxxx>
<clusternode name="oelcl2.name.com<http://oelcl2.name.com>" nodeid="2" votes="1">Subject: fence in xen
Hi list!
I have a question about fence_xvm.
Situation is:
one physical server with xen --> dom0 with 2 domU. Cluster work fine between domU --reboot, relocate,
I'm using redhat 5.5
Problem is with fence from dom0 with "fence_xvm -H oelcl2" , domU is destroyed but when it is booted back domU can't join to the cluster. domU boot very long time --> FENCED_START_TIMEOUT=300
on console I get after the node2 is up:
node2:
INFO: task clurgmgrd:2127 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
clurgmgrd D 0000000000000010 0 2127 2126 (NOTLB)
ffff88006f08dda8 0000000000000286 ffff88007cc0b810 0000000000000000
0000000000000003 ffff880072009860 ffff880072f6b0c0 00000000000455ec
ffff880072009a48 ffffffff802649d7
Call Trace:
[<ffffffff802649d7>] _read_lock_irq+0x9/0x19
[<ffffffff8021420e>] filemap_nopage+0x193/0x360
[<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b
[<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14
[<ffffffff88424b64>] :dlm:dlm_new_lockspace+0x2c/0x860
[<ffffffff80222b08>] __up_read+0x19/0x7f
[<ffffffff802d0abb>] __kmalloc+0x8f/0x9f
[<ffffffff8842b6fa>] :dlm:device_write+0x438/0x5e5
[<ffffffff80217377>] vfs_write+0xce/0x174
[<ffffffff80217bc4>] sys_write+0x45/0x6e
[<ffffffff802602f9>] tracesys+0xab/0xb6
between booting on node2:
Starting clvmd: dlm: Using TCP for communications
clvmd startup timed out
[FAILED]
node2:
[root@oelcl2 init.d]# clustat
Cluster Status for cluster1 @ Sat Sep 11 18:11:21 2010
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
oelcl1 1 Online
oelcl2 2 Online, Local
[root@oelcl2 init.d]#
on first node:
[root@oelcl1 ~]# clustat
Cluster Status for cluster1 @ Sat Sep 11 18:12:07 2010
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
oelcl1 1 Online, Local, rgmanager
oelcl2 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:webby oelcl1 started
[root@oelcl1 ~]#
and then I have to destroy both domU on guest and create it back to get node2 work again.
I have use how to on https://access.redhat.com/kb/docs/DOC-5937 and http://sources.redhat.com/cluster/wiki/VMClusterCookbook
cluster config on dom0
<?xml version="1.0"?>
<cluster alias="vmcluster" config_version="1" name="vmcluster">
<clusternodes>
<clusternode name="vm5" nodeid="1" votes="1"/>
</clusternodes>
<cman/>
<fencedevices/>
<rm/>
<fence_xvmd/>
</cluster>
cluster config on domU
<?xml version="1.0"?>
<cluster alias="cluster1" config_version="49" name="cluster1">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="4"/>
<clusternodes>
<clusternode name="oelcl1.name.comi" nodeid="1" votes="1">
<fence>
<method name="1">
<device domain="oelcl1" name="xenfence1"/>
</method>
</fence>
</clusternode>
<fence><failoverdomainnode name="oelcl1.name.com<http://oelcl1.name.com>" priority="1"/>
<method name="1">
<device domain="oelcl2" name="xenfence1"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_xvm" name="xenfence1"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="prefer_node1" nofailback="0" ordered="1" restricted="1">
<failoverdomainnode name="oelcl2.name.com<http://oelcl2.name.com>" priority="2"/>
Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx></failoverdomain>
</failoverdomains>
<resources>
<ip address="xx.xx.xx.xx" monitor_link="1"/>
<fs device="/dev/xvdb1" force_fsck="0" force_unmount="0" fsid="8669" fstype="ext3" mountpoint="/var/www/html" name="docroot" self_fence="0"/>
<script file="/etc/init.d/httpd" name="apache_s"/>
</resources>
<service autostart="1" domain="prefer_node1" exclusive="0" name="webby" recovery="relocate">
<ip ref="xx.xx.xx.xx"/>
<fs ref="docroot"/>
<script ref="apache_s"/>
</service>
</rm>
</cluster>
fence proces on dom0
[root@vm5 cluster]# ps -ef |grep fenc
root 18690 1 0 17:40 ? 00:00:00 /sbin/fenced
root 18720 1 0 17:40 ? 00:00:00 /sbin/fence_xvmd -I xenbr0
root 22633 14524 0 18:21 pts/3 00:00:00 grep fenc
[root@vm5 cluster]#
and on domU
[root@oelcl1 ~]# ps -ef|grep fen
root 1523 1 0 17:41 ? 00:00:00 /sbin/fenced
root 13695 2902 0 18:22 pts/0 00:00:00 grep fen
[root@oelcl1 ~]#
Do somebody have any idea why fence don't work?
thx
br
jost
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx>
Linux-cluster@xxxxxxxxxx<mailto:Linux-cluster@xxxxxxxxxx>
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster