Nate Carlson wrote: > On Thu, 9 Jun 2005, Patrick Caulfield wrote: > >> It sounds like there's some internal routing problem. If the node gets >> kicked out after 10-15 seconds then no heartbeats are getting through >> at all. > > > Yeah, kind of what I figured. > >> Things to do: >> >> Check that the broadcast messages are being sent onto the xen bridge. > > > Are these the packets on 6809/udp? Yes > Sniffing on both eth0 on the VM and on the Xen bridge, I am seeing > broadcasts from the two physical hosts (xen1 and xen2, 10.20.0.201 and > 202): > > 08:52:58.953913 IP xen1.int.technicality.org.6809 > 10.20.0.255.6809: > UDP, length: 28 > 08:53:03.893870 IP xen2.int.technicality.org.6809 > 10.20.0.255.6809: > UDP, length: 28 > 08:53:03.953963 IP xen1.int.technicality.org.6809 > 10.20.0.255.6809: > UDP, length: 28 > 08:53:08.893795 IP xen2.int.technicality.org.6809 > 10.20.0.255.6809: > UDP, length: 28 > > ..but even when the Xen VM is in the cluster, I see some unicast 6808, > but no broadcast. Odd.. I'll have to investigate that. There are Unicast as well as broadcast messages - that's why they can see each other to start with. I wonder if something is filtering out the broadcasts - is there any iptables filtering on ? I seem to remember having to turn off anispoof when starting the Xen networking. So (if I read that correctly) the physical hosts are OK but the VM doesn't want to play? If broadcast really seems not to work then you could always try multicast... >> Check that all the nodes have the same broadcast address. > > > First thing I checked. > >> Use cman_tool status to check the node addresses in use by each of the >> virtual machines. > > > Sees the correect address on each node. > >> tcpdump is the thing here. > > > Yeah, tcpdump rules. :) > >> If you compiled the modules yourself then make sure you used ARCH=xen >> on the make command-line or all the timeouts are way out. if you're >> using the Fedora packages make sure you have > >> cman-kernel-xenU-2.6.11.4-20050517.141233 > > > I built them myself, based on the RHEL4 tree. I'm 99.9% sure that I did > pass ARCH=xen on all the builds, but I'll rebuild, just to make sure. > >> I do know that Xen clusters work because I'm using it here! > > > That's why it's so odd! > > What version of Xen are you using? I just upgraded to a recent snapshot > of 3.0 (with 2.0, GFS was causing the kernel to crash. Lovely.) > I'm running a slightly old Xen 3.0 snapshot. -- patrick -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster