Re: cman startup issue

gordan@xxxxxxxxxx · Wed, 7 Nov 2007 13:25:53 +0000 (GMT)

On Wed, 7 Nov 2007, Patrick Caulfield wrote:

I'm having a weird problem. I am using a shared GFS root file system,
and the same initrd image on all the machines. The cluster has 3
machines on it at the moment, and 1 refuses to join the cluster,
regardless of which order I bring them up in.

When cman service is being started, it fails when starting cman:

cman not started: Can't find local node name in cluster.conf
/usr/local/sbin/cman_tool: aisexec daemon didn't start

If I try to run aisexec, I get:
aisexec: totemsrp.c:2867: memb_ring_id_store: Assertion `0' failed.

Where should I be looking for causes of this? I double checked my
cluster.conf and the MAC addresses, IP addresses and interface
names are
correct in each node's config.

Check that the new node can write into /tmp - where it is trying to
store the
current ring-id.  It could be SElinux or perhaps the permissions on
the file it
is trying to create.

That fixed the aisexec problem, but the "Can't find local node name in
cluster.conf" problem remains, and cman still won't start. :-(

Well, it won't start if it can' find the local node name in
cluster.conf ...
Have you double-checked that the name(s) in cluster.conf match those
on the
ethernet interfaces ?

You mean as in:
<eth name="eth1" mac="my:ma:ca:dd:re:ss" ip="10.1.2.3"
mask="255.255.255.0"/>
?

If so, then yes, I checked it about 10 times. That was the first thing I
thought was wrong. :-(

As I don't have your cluster.conf or access to your DNS server it's hard to say
from here, but that message does mean what it says. If you have older software
it might not detect anything other than the node's main hostname, but later
versions will check all the interfaces on the system for something that matches
anything in cluster.conf.

Well, the thing that really puzzles me is that the same cluster used to 
work before. All I effectively did was move it to a different IP range and 
changed cluster.conf. I can't figure out what could have changed in the 
meantime to break it, other than cluster.conf. The only other thing that's 
different is that some of the machines have eth1 and eth0 reversed. Before 
they all used eth1 for cluster configuration, and now one of them uses 
eth0 (slightly different model, and the manufacturer mislaeled the ports 
on them). But I have two identical machines, and one connects, the other 
doesn't. It really has me stumped.

I see you're using eth1 so make sure you do have an up-to-date cman.

I'm running the latest that is available for RHEL5.

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster