On Fri, 2008-02-08 at 19:33 -0200, Celso K. Webber wrote: > Hi Lon, > > On Fri, 08 Feb 2008 16:15:36 -0500, Lon Hohberger wrote > > On Fri, 2008-02-08 at 11:18 -0200, Celso K. Webber wrote: > > > Feb 7 20:07:01 mrp02 kernel: dlm: no local IP address has been set > > > Feb 7 20:07:01 mrp02 kernel: dlm: cannot start dlm lowcomms -107 > > > > This is why rgmanager didn't work (and possibly even exited). Does > > 'uname -n' match what's in cluster.conf? > > > No, it does not! I didn't know it should match, I'm configuring RHCS Clusters > since version 2.1 and this never bothered me, sorry!!! > > Well, I usually do the following in /etc/hosts: > -> assume network 192.168.1.0/24 is for public access > -> assume network 10.0.0.0/8 is for heartbeat > > 192.168.1.1 realservername1.domainname realservername1 > 192.168.1.2 realservername2.domainname realservername2 > > 10.0.0.1 node1.localdomain node1 > 10.0.0.2 node2.localdomain node2 > > 192.168.1.3 servicename1.domainname servicename1 > 192.168.1.4 servicename2.domainname servicename2 > ... and so on for other virtual IPs for services ... > > Then I configure in cluster.conf the names associated with the private > addresses/interfaces, so that I'm sure that heartbeat traffic is going > through the correct interfaces. > > For obvious reasons, "uname -n" returns the public hostnames, such as > realservername1.domainname. > > I noticed that from some time there is a question in the FAQ explaining how > to "bind" the heartbeat traffic to a specific interface/address. But I was > happy with my solution, specially because the answer to that question > suggested touching the init script, and I don't like to alter standard system > files, specially init scripts. At least in RHCS v4, I didn't find a better > way to "bind" the heartbeat traffic to a specific interface. I didn't > experiment about this with RHCS v5, I just went on with my previous method. > > For me this is common practice, for instance, Oracle Database respects an > environment variable called ORACLE_HOSTNAME, so that you can "instruct" the > several utilities to consider that name instead of the real server's name. > This is very useful in a Cluster environment. > > Please tell me: > * is it really wrong set the node names in cluster.conf to a name different > to that reported by "uname -n"? > * if it is "ugly" or considered wrong, what is the best way to instruct CMAN > which interface to use for heartbeat? I think it's mostly fixed in RHEL5. We have updated the CMAN init script for RHEL5 to allow /etc/sysconfig/cluster to have "NODENAME=preferred_host_name". It will go out with the next update, but here it is in CVS: *massive url* http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/cluster/cman/init.d/Attic/cman?rev=1.26.2.6&content-type=text/plain&cvsroot=cluster&hideattic=0&only_with_tag=RHEL5 tinyurl: http://tinyurl.com/2fg6nd It still could be a bug. The dlm unable to determine the local hostname is definitely why rgmanager died (it needs the DLM!). Updating the script / trying to force CMAN with a specific node name is just one way to eliminate a possible cause (and it might fix it, too ;) ). > * does this solution work both for RHCS v4 and v5? The RHEL5 script is not backwards compatible, but cman_tool join -n <preferred_host_name> is. > * would it be better to have only one interface for public and heartbeat > traffic, maybe channel bonding dual NICs? Better is certainly a matter of perception in this case. I would expect you'd want to get your current configuration working before altering your network topology. Also, it's not like your configuration is particularly strange... > * is there any other significant difference between RHCSv4 and v5 I should be > aware of? > As always, thank you very very much for your support! We do what we can, but please keep in mind that a public mailing list isn't a very good support forum compared to (for example): https://www.redhat.com/apps/support/ -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster