Re: CMAN removing nodes from cluster?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 9 Jun 2005, Patrick Caulfield wrote:
It sounds like there's some internal routing problem. If the node gets kicked out after 10-15 seconds then no heartbeats are getting through at all.

Yeah, kind of what I figured.

Things to do:

Check that the broadcast messages are being sent onto the xen bridge.

Are these the packets on 6809/udp?

Sniffing on both eth0 on the VM and on the Xen bridge, I am seeing broadcasts from the two physical hosts (xen1 and xen2, 10.20.0.201 and 202):

08:52:58.953913 IP xen1.int.technicality.org.6809 > 10.20.0.255.6809: UDP, length: 28
08:53:03.893870 IP xen2.int.technicality.org.6809 > 10.20.0.255.6809: UDP, length: 28
08:53:03.953963 IP xen1.int.technicality.org.6809 > 10.20.0.255.6809: UDP, length: 28
08:53:08.893795 IP xen2.int.technicality.org.6809 > 10.20.0.255.6809: UDP, length: 28

..but even when the Xen VM is in the cluster, I see some unicast 6808, but no broadcast. Odd.. I'll have to investigate that.

Check that all the nodes have the same broadcast address.

First thing I checked.

Use cman_tool status to check the node addresses in use by each of the
virtual machines.

Sees the correect address on each node.

tcpdump is the thing here.

Yeah, tcpdump rules.  :)

If you compiled the modules yourself then make sure you used ARCH=xen on the make command-line or all the timeouts are way out. if you're using the Fedora packages make sure you have > cman-kernel-xenU-2.6.11.4-20050517.141233

I built them myself, based on the RHEL4 tree. I'm 99.9% sure that I did pass ARCH=xen on all the builds, but I'll rebuild, just to make sure.

I do know that Xen clusters work because I'm using it here!

That's why it's so odd!

What version of Xen are you using? I just upgraded to a recent snapshot of 3.0 (with 2.0, GFS was causing the kernel to crash. Lovely.)

------------------------------------------------------------------------
| nate carlson | natecars@xxxxxxxxxxxxxxx | http://www.natecarlson.com |
|       depriving some poor village of its idiot since 1981            |
------------------------------------------------------------------------

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux