gordan@xxxxxxxxxx wrote: > On Wed, 7 Nov 2007, Patrick Caulfield wrote: > >>>>>>> I'm having a weird problem. I am using a shared GFS root file >>>>>>> system, >>>>>>> and the same initrd image on all the machines. The cluster has 3 >>>>>>> machines on it at the moment, and 1 refuses to join the cluster, >>>>>>> regardless of which order I bring them up in. >>>>>>> >>>>>>> When cman service is being started, it fails when starting cman: >>>>>>> >>>>>>> cman not started: Can't find local node name in cluster.conf >>>>>>> /usr/local/sbin/cman_tool: aisexec daemon didn't start >>>>>>> >>>>>>> If I try to run aisexec, I get: >>>>>>> aisexec: totemsrp.c:2867: memb_ring_id_store: Assertion `0' failed. >>>>>>> >>>>>>> Where should I be looking for causes of this? I double checked my >>>>>>> cluster.conf and the MAC addresses, IP addresses and interface >>>>>>> names are >>>>>>> correct in each node's config. >>>>>> >>>>>> Check that the new node can write into /tmp - where it is trying to >>>>>> store the >>>>>> current ring-id. It could be SElinux or perhaps the permissions on >>>>>> the file it >>>>>> is trying to create. >>>>> >>>>> That fixed the aisexec problem, but the "Can't find local node name in >>>>> cluster.conf" problem remains, and cman still won't start. :-( >>>> >>>> Well, it won't start if it can' find the local node name in >>>> cluster.conf ... >>>> Have you double-checked that the name(s) in cluster.conf match those >>>> on the >>>> ethernet interfaces ? >>> >>> You mean as in: >>> <eth name="eth1" mac="my:ma:ca:dd:re:ss" ip="10.1.2.3" >>> mask="255.255.255.0"/> >>> ? >>> >>> If so, then yes, I checked it about 10 times. That was the first thing I >>> thought was wrong. :-( >> >> As I don't have your cluster.conf or access to your DNS server it's >> hard to say >> from here, but that message does mean what it says. If you have older >> software >> it might not detect anything other than the node's main hostname, but >> later >> versions will check all the interfaces on the system for something >> that matches >> anything in cluster.conf. > > Well, the thing that really puzzles me is that the same cluster used to > work before. All I effectively did was move it to a different IP range > and changed cluster.conf. I can't figure out what could have changed in > the meantime to break it, other than cluster.conf. The only other thing > that's different is that some of the machines have eth1 and eth0 > reversed. Before they all used eth1 for cluster configuration, and now > one of them uses eth0 (slightly different model, and the manufacturer > mislaeled the ports on them). But I have two identical machines, and one > connects, the other doesn't. It really has me stumped. > >> I see you're using eth1 so make sure you do have an up-to-date cman. > > I'm running the latest that is available for RHEL5. If that's what came with 5.0 then there's a bug in the name matching. I can't figure out from the CVS tags in which package this was fixed unfortunately. "revision 1.26 date: 2007/03/15 11:12:33; author: pcaulfield; state: Exp; lines: +16 -13 If the machine is multi-homed, then using a truncated name in uname but not in cluster.conf would fail to match them up." -- Patrick -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster