Cluster of XEN guests unstable when rebooting a node

Paolo Marini <paolom@xxxxxxxxxxxxx> · Sun, 09 Dec 2007 09:01:41 +0100

I am building up a cluster of XEN Guests with root file system residing 
on a file on an GFS filesystem (iscsi actually). Each cluster node 
mounts an GFS file system residing on an iscsi device. For performance 
reasons, both the iscsi device and the physical nodes (part also of a 
cluster) use two gigabit ethernet with bonding and LACP.

For the physical machines, I had to insert a sleep 30 on the 
/etc/init.d/iscsi script before the iscsi login, in order to wait for 
the bond interface to come up, otherwise the iscsi devices are not seen 
and no gfs mount is possible.

Then, going to the cluster of XEN Guests, they work fine, I am able to 
migrate each one to a different physical node without problems on the 
guest. When I reboot or fence a guest, the guest cluster breaks, e.g. 
the quorum is dissolved and I have to fence ALL the nodes and reboot 
them in order for the cluster to restart. Does it have to do with the 
xen bridge going up and down for a time longer than the heartbeat timeout ?

Is it still valid (and so the solution to the problems I found) this 
entry in the FAQ ?

When I reboot a xen dom, I get cluster errors and it gets fenced. What's 
going on and how do I fix it?

As I understand it, the problem is due to the fact that xen nodes tear 
down and rebuild the ethernet nic after cluster suite has started. We're 
working on a more permanent solution. In the meantime, here is a 
workaround:

 1. Edit the file: /etc/xen/xend-config.sxp line. Locate the line that
    reads:

    (network-script network-bridge)

    Change that line to read:

    (network-script /bin/true)

 2. Create and/or edit file /etc/sysconfig/network-scripts/ifcfg-eth0
    to look something like:

    DEVICE=eth0
    ONBOOT=yes
    BRIDGE=xenbr0
    HWADDR=XX:XX:XX:XX:XX:XX

    Where XX:XX:XX:XX:XX:XX is the mac address of your network card.

 3. Create and/or edit file
    /etc/sysconfig/network-scripts/ifcfg-xenbr0 to look something like:

    DEVICE=xenbr0
    ONBOOT=yes
    BOOTPROTO=static
    IPADDR=10.0.0.116
    NETMASK=255.255.255.0
    GATEWAY=10.0.0.254
    TYPE=Bridge
    DELAY=0

    Substitute your appropriate IP address, netmask and gateway
    information.

Thanks, Paolo

begin:vcard
fn:Paolo Marini
n:Marini;Paolo
org:Prisma Engineering srl
adr;dom:;;via Petrocchi 4;Milano;Italy;20152
email;internet:paolom@xxxxxxxxxxxxx
tel;work:+39 02 26113507
tel;fax:+39 02 26113597
tel;cell:+39 335 6525835
x-mozilla-html:TRUE
url:http://www.prisma-eng.com
version:2.1
end:vcard

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster