Good ! It seems the right solution. Below my answers/comments.
Thanks, Paolo
On Wed, 2007-12-12 at 19:23 +0100, Paolo Marini wrote:
I reiterate the request for help hoping someone has undergone (and
hopefully solved) the same issues.
I am building up a cluster of XEN Guests with root file system residing
on a file on an GFS filesystem (iscsi actually).
Each cluster node mounts an GFS file system residing on an iscsi device.
For performance reasons, both the iscsi device and the physical nodes
(part also of a cluster) use two gigabit ethernet with bonding and LACP.
For the physical machines, I had to insert a sleep 30 on the
/etc/init.d/iscsi script before the iscsi login, in order to wait for
the bond interface to come up, otherwise the iscsi devices are not seen
and no gfs mount is possible.
Then, going to the cluster of XEN Guests, they work fine, I am able to
migrate each one to a different physical node without problems on the guest.
When I reboot or fence one of the guests, the guest cluster breaks, e.g.
the quorum is dissolved and I have to fence ALL the nodes and reboot
them in order for the cluster to restart.
How many guests - and what are you using for fencing ?
I am using 5 guests - 4 are within a cluster and the remaining one is a
management node (nagios etc.). I am using fencing with fence_xvm and it
is correctly configured and working. Each Physical node is a DELL PE860
with 4 Gb of RAM, one quad XEON and 3 network interfaces, two are used
for bonding and the third one is reserved for IPMI (which I use for
fencing of the physical nodes).
The guests configure two network interfaces (eth0 and eth0:0), one is
for private communications between the nodes and to the iscsi device,
the other for the public access to the nodes. I am not using VLAN.
Does it have to do with the xen bridge going up and down for a time
longer than the heartbeat timeout ?
Not sure - it shouldn't be that big of a deal. If you think that's the
problem try adding:
<totem token="30000"/>
It seems much more stable. More tests will prove this. By now, xm
destroy on a guest causes the whole cluster of guests to stay up, detect
the missing guest, fence successfully it. The machine restarts and
rejoins the cluster.
to the vm cluster's cluster.conf
-- Lon
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
begin:vcard
fn:Paolo Marini
n:Marini;Paolo
org:Prisma Engineering srl
adr;dom:;;via Petrocchi 4;Milano;Italy;20152
email;internet:paolom@xxxxxxxxxxxxx
tel;work:+39 02 26113507
tel;fax:+39 02 26113597
tel;cell:+39 335 6525835
x-mozilla-html:TRUE
url:http://www.prisma-eng.com
version:2.1
end:vcard
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster