> From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Agnieszka > Kukalowicz > Sent: Monday, February 11, 2008 4:56 PM > To: linux-cluster@xxxxxxxxxx > Subject: Fence_xvmd/fence_xvm problem > > Hi, > > I was trying to configure Xen guests as virtual services under Cluster Suite. My configuration is simple: > > Node one "d1" runs xen guest as virtual service "vm_service1", and node one "d2" runs virtual service > "vm_service2". > > The /etc/cluster/cluster.conf file is below: > > <?xml version="1.0"?> > <cluster alias="VM_Data_Cluster" config_version="112" name="VM_Data_Cluster"> > <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="300"/> > <clusternodes> > <clusternode name="d1" nodeid="1" votes="1"> > <multicast addr="225.0.0.1" interface="eth0"/> > <fence> > <method name="1"> > <device name="apc_power_switch" port="1"/> > </method> > </fence> > </clusternode> > <clusternode name="d2" nodeid="2" votes="1"> > <multicast addr="225.0.0.1" interface="eth0"/> > <fence> > <method name="1"> > <device name="apc_power_switch" port="2"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman expected_votes="1" two_node="1"> > <multicast addr="225.0.0.1"/> > </cman> > <fencedevices> > <fencedevice agent="fence_apc" ipaddr="X.X.X.X" login="apc" name="apc_power_switch" > passwd="apc"/> > </fencedevices> > <rm> > <failoverdomains> > <failoverdomain name="VM_d1_failover" ordered="0" restricted="0"> > <failoverdomainnode name="d1" priority="1"/> > </failoverdomain> > <failoverdomain name="VM_d2_failover" ordered="0" restricted="0"> > <failoverdomainnode name="d2" priority="1"/> > </failoverdomain> > <resources/> > <vm autostart="1" domain="VM_d1_failover" exclusive="0" name="vm_service1" > path="/virts/service1" recovery="relocate"/> > <vm autostart="1" domain="VM_d2_failover" exclusive="0" name="vm_service2" > path="/virts/service2" recovery="relocate"/> > </rm> > <totem consensus="4800" join="60" token="10000" token_retransmits_before_loss_const="20"/> > <fence_xvmd family="ipv4"/> > </cluster> > > On guests "vm_service1" and "vm_service2" I have configured the second cluster. > > <cluster alias="SV_Data_Cluster" config_version="29" name="SV_Data_Cluster"> > <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> > <clusternodes> > <clusternode name="d11" nodeid="1" votes="1"> > <fence> > <method name="1"> > <device domain="d11" name="virtual_fence"/> > </method> > </fence> > </clusternode> > <clusternode name="d12" nodeid="2" votes="1"> > <fence> > <method name="1"> > <device domain="d12" name="virtual_fence"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman expected_votes="1" two_node="1"/> > <fencedevices> > <fencedevice agent="fence_xvm" name="virtual_fence"/> > </fencedevices> > <rm> > ... > </rm> > </cluster> > > The problem is that the fence_xvmd/fence_xvm mechanism doesn't work due to propably misconfiguration of > multicast. > > Physical nodes "d1" and "d2" and xen guests "vm_service1" and "vm_service2" have two ethernet interfaces: > private- 10.0.200.x (eth0) and public (eth1). > > On physical nodes, "fence_xvmd" deamon listens defaults on eth1 interface: > [root@d2 ~]# netstat -g > IPv6/IPv4 Group Memberships > Interface RefCnt Group > --------------- ------ --------------------- > lo 1 ALL-SYSTEMS.MCAST.NET > eth0 1 225.0.0.1 > eth0 1 ALL-SYSTEMS.MCAST.NET > eth1 1 225.0.0.12 > eth1 1 ALL-SYSTEMS.MCAST.NET > virbr0 1 ALL-SYSTEMS.MCAST.NET > lo 1 ff02::1 > .... > > Next when I make on xen guest "vm_service1" a test to fence guest "vm_service2" I get: > > [root@d11 cluster]# /sbin/fence_xvm -H d12 -ddddd > Debugging threshold is now 5 > -- args @ 0xbf8aea70 -- > args->addr = 225.0.0.12 > args->domain = d12 > args->key_file = /etc/cluster/fence_xvm.key > args->op = 2 > args->hash = 2 > args->auth = 2 > args->port = 1229 > args->family = 2 > args->timeout = 30 > args->retr_time = 20 > args->flags = 0 > args->debug = 5 > -- end args -- > Reading in key file /etc/cluster/fence_xvm.key into 0xbf8ada1c (4096 max size) > Actual key length = 4096 bytesOpening /dev/urandom > Sending to 225.0.0.12 via 127.0.0.1 > Opening /dev/urandom > Sending to 225.0.0.12 via X.X.X.X > Opening /dev/urandom > Sending to 225.0.0.12 via 10.0.200.124 > Waiting for connection from XVM host daemon. > .... > Waiting for connection from XVM host daemon. > Timed out waiting for response > > On the node "d2" where "vm_service2" is running I get: > > [root@d2 ~]# /sbin/fence_xvmd -fddd > Debugging threshold is now 3 > -- args @ 0xbfc54e3c -- > args->addr = 225.0.0.12 > args->domain = (null) > args->key_file = /etc/cluster/fence_xvm.key > args->op = 2 > args->hash = 2 > args->auth = 2 > args->port = 1229 > args->family = 2 > args->timeout = 30 > args->retr_time = 20 > args->flags = 1 > args->debug = 3 > -- end args -- > Reading in key file /etc/cluster/fence_xvm.key into 0xbfc53e3c (4096 max size) > Actual key length = 4096 bytesOpened ckpt vm_states > My Node ID = 1 > Domain UUID Owner State > ------ ---- ----- ----- > Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 > vm_service2 2dd8193f-e4d4-f41c-a4af-f5b30d19fe00 00001 00001 > Storing vm_service2 > Domain UUID Owner State > ------ ---- ----- ----- > Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 > vm_service2 2dd8193f-e4d4-f41c-a4af-f5b30d19fe00 00001 00001 > Storing vm_service2 > Request to fence: d12. > Evaluating Domain: d12 Last Owner/State Unknown > Domain UUID Owner State > ------ ---- ----- ----- > Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 > vm_service2 2dd8193f-e4d4-f41c-a4af-f5b30d19fe00 00001 00001 > Storing vm_service2 > Request to fence: d12 > Evaluating Domain: d12 Last Owner/State Unknown > > So it looks like the fence_xvmd and fence_xvm cannot communicate earch other. > But "fence_xvm" on "vm_service1" sends multicast packets through all interfaces and node "d2" can receive them. > Tcpdump on node "d2" says that the node "d2" receives the packages: > > [root@d2 ~]# tcpdump -i peth0 -n host 225.0.0.12 > listening on peth0, link-type EN10MB (Ethernet), capture size 96 bytes > 17:50:47.972477 IP 10.0.200.124.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176 > 17:50:49.960841 IP 10.0.200.124.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176 > 17:50:51.977425 IP 10.0.200.124.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176 > > [root@d2 ~]# tcpdump -i peth1 -n host 225.0.0.12 > listening on peth1, link-type EN10MB (Ethernet), capture size 96 bytes > 17:51:26.168132 IP X.X.X.X.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176 > 17:51:28.184802 IP X.X.X.X.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176 > 17:51:30.196875 IP X.X.X.X.filenet-pch > 225.0.0.12.novell-zfs: UDP, length 176 > > But I can't see the "node2" sends anything to xen guest "vm_service1". So "fence_xvm" gets timeout. > What can I do wrong? > > Cheers > > Agnieszka Kukałowicz > NASK, Polska.pl Hi, Can you show the results of "netstat -nr" as well? Regards, Bernard Chew -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster