Patrick Caulfield wrote: > gordan@xxxxxxxxxx wrote: >> On Wed, 7 Nov 2007, Patrick Caulfield wrote: >> >>>>>>>> I'm having a weird problem. I am using a shared GFS root file >>>>>>>> system, >>>>>>>> and the same initrd image on all the machines. The cluster has 3 >>>>>>>> machines on it at the moment, and 1 refuses to join the cluster, >>>>>>>> regardless of which order I bring them up in. >>>>>>>> >>>>>>>> When cman service is being started, it fails when starting cman: >>>>>>>> >>>>>>>> cman not started: Can't find local node name in cluster.conf >>>>>>>> /usr/local/sbin/cman_tool: aisexec daemon didn't start >>>>>>>> >>>>>>>> If I try to run aisexec, I get: >>>>>>>> aisexec: totemsrp.c:2867: memb_ring_id_store: Assertion `0' failed. >>>>>>>> >>>>>>>> Where should I be looking for causes of this? I double checked my >>>>>>>> cluster.conf and the MAC addresses, IP addresses and interface >>>>>>>> names are >>>>>>>> correct in each node's config. >>>>>>> Check that the new node can write into /tmp - where it is trying to >>>>>>> store the >>>>>>> current ring-id. It could be SElinux or perhaps the permissions on >>>>>>> the file it >>>>>>> is trying to create. >>>>>> That fixed the aisexec problem, but the "Can't find local node name in >>>>>> cluster.conf" problem remains, and cman still won't start. :-( >>>>> Well, it won't start if it can' find the local node name in >>>>> cluster.conf ... >>>>> Have you double-checked that the name(s) in cluster.conf match those >>>>> on the >>>>> ethernet interfaces ? >>>> You mean as in: >>>> <eth name="eth1" mac="my:ma:ca:dd:re:ss" ip="10.1.2.3" >>>> mask="255.255.255.0"/> >>>> ? >>>> >>>> If so, then yes, I checked it about 10 times. That was the first thing I >>>> thought was wrong. :-( >>> As I don't have your cluster.conf or access to your DNS server it's >>> hard to say >>> from here, but that message does mean what it says. If you have older >>> software >>> it might not detect anything other than the node's main hostname, but >>> later >>> versions will check all the interfaces on the system for something >>> that matches >>> anything in cluster.conf. >> Well, the thing that really puzzles me is that the same cluster used to >> work before. All I effectively did was move it to a different IP range >> and changed cluster.conf. I can't figure out what could have changed in >> the meantime to break it, other than cluster.conf. The only other thing >> that's different is that some of the machines have eth1 and eth0 >> reversed. Before they all used eth1 for cluster configuration, and now >> one of them uses eth0 (slightly different model, and the manufacturer >> mislaeled the ports on them). But I have two identical machines, and one >> connects, the other doesn't. It really has me stumped. >> >>> I see you're using eth1 so make sure you do have an up-to-date cman. >> I'm running the latest that is available for RHEL5. > > If that's what came with 5.0 then there's a bug in the name matching. I can't > figure out from the CVS tags in which package this was fixed unfortunately. > > "revision 1.26 > date: 2007/03/15 11:12:33; author: pcaulfield; state: Exp; lines: +16 -13 > If the machine is multi-homed, then using a truncated name in uname but not in > cluster.conf would fail to match them up." Well, I can tell you that the fix is NOT in cman-2.0.61, and it IS in cman-2.0.73. Sorry I can't be more specific! -- Patrick -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster