Thanks, Kaloyan. Now we're talking. This is something I hadn't already tried yet. I will try it as soon as I get in. Wes On 1/9/2012 3:08 AM, Kaloyan Kovachev wrote: > Hi, > check /etc/sysconfig/cman maybe there is a different name present as > NODENAME ... remove the file (if present) or try to create one as: > > #CMAN_CLUSTER_TIMEOUT=120 > #CMAN_QUORUM_TIMEOUT=0 > #CMAN_SHUTDOWN_TIMEOUT=60 > FENCED_START_TIMEOUT=120 > ##FENCE_JOIN=no > #LOCK_FILE="/var/lock/subsys/cman" > CLUSTERNAME=ClusterName > NODENAME=NodeName > > > On Sun, 08 Jan 2012 20:03:18 -0800, Wes Modes <wmodes@xxxxxxxx> wrote: >> The behavior of cman's resolving of cluster node names is less than >> clear, as per the RHEL bugzilla report. >> >> The hostname and cluster.conf match, as does /etc/hosts and uname -n. >> The short names and FQDN ping. I believe all the node cluster.conf are >> in sync, and all nodes are accessible to each other using either short >> or long names. >> >> You'll have to trust that I've tried everything obvious, and every >> possible combination of FQDN and short names in cluster.conf and >> hostname. That said, it is totally possible I missed something obvious. >> >> I suspect, there is something else going on and I don't know how to get >> at it. >> >> Wes >> >> >> On 1/6/2012 6:06 PM, Kevin Stanton wrote: >>>> Hi, >>>> I think CMAN expect that the names of the cluster nodes be the same >>> returned by the command "uname -n". >>> >>>> For what you write your nodes hostnames are: test01.gdao.ucsc.edu >>> and test02.gdao.ucsc.edu, but in cluster.conf you have declared only >>> "test01" and "test02". >>> >>> >>> >>> I haven't found this to be the case in the past. I actually use a >>> separate short name to reference each node which is different than the >>> hostname of the server itself. All I've ever had to do is make sure >>> it resolves correctly. You can do this either in DNS and/or in >>> /etc/hosts. I have found that it's a good idea to do both in case >>> your DNS server is a virtual machine and is not running for some >>> reason. In that case with /etc/hosts you can still start cman. >>> >>> >>> >>> I would make sure whatever node names you use in the cluster.conf will >>> resolve when you try to ping it from all nodes in the cluster. Also >>> make sure your cluster.conf is in sync between all nodes. >>> >>> >>> >>> -Kevin >>> >>> >>> >>> >>> >>> > ------------------------------------------------------------------------ >>> These servers are currently on the same host, but may not be in >>> the future. They are in a vm cluster (though honestly, I'm not >>> sure what this means yet). >>> >>> SElinux is on, but disabled. >>> Firewalling through iptables is turned off via >>> system-config-securitylevel >>> >>> There is no line currently in the cluster.conf that deals with >>> multicasting. >>> >>> Any other suggestions? >>> >>> Wes >>> >>> On 1/6/2012 12:05 PM, Luiz Gustavo Tonello wrote: >>> >>> Hi, >>> >>> >>> >>> This servers is on VMware? At the same host? >>> >>> SElinux is disable? iptables have something? >>> >>> >>> >>> In my environment I had a problem to start GFS2 with servers in >>> differents hosts. >>> >>> To clustering servers, was need migrate one server to the same >>> host of the other, and restart this. >>> >>> >>> >>> I think, one of the problem was because the virtual switchs. >>> >>> To solve, I changed a multicast IP, to use 225.0.0.13 at >>> cluster.conf >>> >>> <multicast addr="225.0.0.13"/> >>> >>> And add a static route in both, to use default gateway. >>> >>> >>> >>> I don't know if it's correct, but this solve my problem. >>> >>> >>> >>> I hope that help you. >>> >>> >>> >>> Regards. >>> >>> >>> >>> On Fri, Jan 6, 2012 at 5:01 PM, Wes Modes <wmodes@xxxxxxxx >>> <mailto:wmodes@xxxxxxxx>> wrote: >>> >>> Hi, Steven. >>> >>> I've tried just about every possible combination of hostname and >>> cluster.conf. >>> >>> ping to test01 resolves to 128.114.31.112 >>> ping to test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu> >>> resolves to 128.114.31.112 >>> >>> It feels like the right thing is being returned. This feels like > it >>> might be a quirk (or bug possibly) of cman or openais. >>> >>> There are some old bug reports around this, for example >>> https://bugzilla.redhat.com/show_bug.cgi?id=488565. It sounds >>> like the >>> way that cman reports this error is anything but straightforward. >>> >>> Is there anyone who has encountered this error and found a > solution? >>> Wes >>> >>> >>> >>> On 1/6/2012 2:00 AM, Steven Whitehouse wrote: >>> > Hi, >>> > >>> > On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote: >>> >> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS >>> systems >>> >> running on vmWare. The GFS FS is on a Dell Equilogic SAN. >>> >> >>> >> I keep running into the same problem despite many >>> differently-flavored >>> >> attempts to set up GFS. The problem comes when I try to start >>> cman, the >>> >> cluster management software. >>> >> >>> >> [root@test01]# service cman start >>> >> Starting cluster: >>> >> Loading modules... done >>> >> Mounting configfs... done >>> >> Starting ccsd... done >>> >> Starting cman... failed >>> >> cman not started: Can't find local node name in cluster.conf >>> >> /usr/sbin/cman_tool: aisexec daemon didn't start >>> >> >>> [FAILED] >>> >> >>> > This looks like what it says... whatever the node name is in >>> > cluster.conf, it doesn't exist when the name is looked up, or >>> possibly >>> > it does exist, but is mapped to the loopback address (it needs to >>> map to >>> > an address which is valid cluster-wide) >>> > >>> > Since your config files look correct, the next thing to check is >>> > what >>> > the resolver is actually returning. Try (for example) a ping to >>> test01 >>> > (you need to specify exactly the same form of the name as is used >>> > in >>> > cluster.conf) from test02 and see whether it uses the correct ip >>> > address, just in case the wrong thing is being returned. >>> > >>> > Steve. >>> > >>> >> [root@test01]# tail /var/log/messages >>> >> Jan 5 13:39:40 testbench06 ccsd[13194]: Unable to connect > to >>> >> cluster infrastructure after 1193640 seconds. >>> >> Jan 5 13:40:10 testbench06 ccsd[13194]: Unable to connect > to >>> >> cluster infrastructure after 1193670 seconds. >>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS >>> >> Executive >>> >> Service RELEASE 'subrev 1887 version 0.80.6' >>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright >>> >> (C) >>> >> 2002-2006 MontaVista Software, Inc and contributors. >>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright >>> >> (C) >>> >> 2006 Red Hat, Inc. >>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS >>> >> Executive >>> >> Service: started and ready to provide service. >>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] local >>> node name >>> >> "test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>" not found >>> in cluster.conf >>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error >>> reading CCS >>> >> info, cannot start >>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] Error >>> >> reading >>> >> config from CCS >>> >> Jan 5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS >>> >> Executive >>> >> exiting (reason: could not read the main configuration file). >>> >> >>> >> Here are details of my configuration: >>> >> >>> >> [root@test01]# rpm -qa | grep cman >>> >> cman-2.0.115-85.el5_7.2 >>> >> >>> >> [root@test01]# echo $HOSTNAME >>> >> test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu> >>> >> >>> >> [root@test01]# hostname >>> >> test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu> >>> >> >>> >> [root@test01]# cat /etc/hosts >>> >> # Do not remove the following line, or various programs >>> >> # that require network functionality will fail. >>> >> 128.114.31.112 test01 test01.gdao test01.gdao.ucsc.edu >>> <http://test01.gdao.ucsc.edu> >>> >> 128.114.31.113 test02 test02.gdao test02.gdao.ucsc.edu >>> <http://test02.gdao.ucsc.edu> >>> >> 127.0.0.1 localhost.localdomain localhost >>> >> ::1 localhost6.localdomain6 localhost6 >>> >> >>> >> [root@test01]# sestatus >>> >> SELinux status: enabled >>> >> SELinuxfs mount: /selinux >>> >> Current mode: permissive >>> >> Mode from config file: permissive >>> >> Policy version: 21 >>> >> Policy from config file: targeted >>> >> >>> >> [root@test01]# cat /etc/cluster/cluster.conf >>> >> <?xml version="1.0"?> >>> >> <cluster config_version="25" name="gdao_cluster"> >>> >> <fence_daemon post_fail_delay="0" > post_join_delay="120"/> >>> >> <clusternodes> >>> >> <clusternode name="test01" nodeid="1" votes="1"> >>> >> <fence> >>> >> <method name="single"> >>> >> <device name="gfs_vmware"/> >>> >> </method> >>> >> </fence> >>> >> </clusternode> >>> >> <clusternode name="test02" nodeid="2" votes="1"> >>> >> <fence> >>> >> <method name="single"> >>> >> <device name="gfs_vmware"/> >>> >> </method> >>> >> </fence> >>> >> </clusternode> >>> >> </clusternodes> >>> >> <cman/> >>> >> <fencedevices> >>> >> <fencedevice agent="fence_manual" name="gfs1_ipmi"/> >>> >> <fencedevice agent="fence_vmware" name="gfs_vmware" >>> >> ipaddr="gdvcenter.ucsc.edu <http://gdvcenter.ucsc.edu>" >>> login="root" passwd="1hateAmazon.com" >>> >> vmlogin="root" vmpasswd="esxpass" >>> >> >>> > port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/> >>> >> </fencedevices> >>> >> <rm> >>> >> <failoverdomains/> >>> >> </rm> >>> >> </cluster> >>> >> >>> >> I've seen much discussion of this problem, but no definitive >>> solutions. >>> >> Any help you can provide will be welcome. >>> >> >>> >> Wes Modes >>> >> >>> >> -- >>> >> Linux-cluster mailing list >>> >> Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx> >>> >> https://www.redhat.com/mailman/listinfo/linux-cluster >>> > >>> > -- >>> > Linux-cluster mailing list >>> > Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx> >>> > https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx> >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> >>> >>> >>> -- >>> Luiz Gustavo P Tonello. >>> >>> >>> >>> -- >>> >>> Linux-cluster mailing list >>> >>> Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx> >>> >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx <mailto:Linux-cluster@xxxxxxxxxx> >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster