Two node cluster, start CMAN fence the other node

Alex Re <are@xxxxxx> · Thu, 15 Apr 2010 19:05:01 +0200

Good afternoon,

I'm trying to form my first cluster of two nodes, using iLO fence
devices. I need some help because I can't find what I've missed. 

My main problem is that the "service cman start" reboots the other node
and I can't form the two nodes cluster.

I'm using (at both nodea and nodeb, they are on the same VLAN and pings
each other ok):

[root@nodea ~]#
uname -a

Linux nodea 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010
x86_64 x86_64 x86_64 GNU/Linux

[root@nodea ~]# rpm
-qa |grep cman

cman-2.0.115-1.el5_4.9

[root@nodea ~]# cat
/etc/cluster/cluster.conf (nodeb has the same file)

<?xml version="1.0" ?>

<cluster alias="VCluster" config_version="5" name="VCluster">

    <fence_daemon post_fail_delay="0" post_join_delay="25"/>

    <clusternodes>

        <clusternode name="nodea" nodeid="1" votes="1">

            <fence>

                <method name="1">

                    <device name="nodeaILO"/>

                </method>

            </fence>

        </clusternode>

        <clusternode name="nodeb" nodeid="2" votes="1">

            <fence>

                <method name="1">

                    <device name="nodebILO"/>

                </method>

            </fence>

        </clusternode>

    </clusternodes>

    <cman expected_votes="1" two_node="1"/>

    <fencedevices>

        <fencedevice agent="fence_ilo" hostname="nodeacn"
login="user" name="nodeaILO" passwd="hp"/>

        <fencedevice agent="fence_ilo" hostname="nodebcn"
login="user" name="nodebILO" passwd="hp"/>

    </fencedevices>

    <rm>

        <failoverdomains/>

        <resources/>

    </rm>

</cluster>

When I start the cman service, it hangs up for some time at the
"Starting fencing..." step and after those configured 25secs it fences
nodeb and reboots it.

[root@nodea ~]#
service cman start

Starting cluster: 

   Loading modules... done

   Mounting configfs... done

   Starting ccsd... done

   Starting cman... done

   Starting daemons... done

   Starting fencing... done

                                                           [  OK  ]

"nodeb" gets rebooted:

[root@nodeb ~]# 

Broadcast message from root (Thu Apr 15 18:42:24 2010):

The system is going down for system halt NOW!

At the syslog I just can find:

Apr 15 18:40:59
nodea ccsd[16930]: Initial status:: Quorate 

Apr 15 18:40:59 nodea openais[16936]: [CLM  ] Members Left: 

Apr 15 18:40:59 nodea openais[16936]: [CLM  ] Members Joined: 

Apr 15 18:40:59 nodea openais[16936]: [CLM  ] CLM CONFIGURATION CHANGE 

Apr 15 18:41:00 nodea openais[16936]: [CLM  ] New Configuration: 

Apr 15 18:41:00 nodea openais[16936]: [CLM  ]     r(0)
ip(10.192.16.42)  

Apr 15 18:41:00 nodea openais[16936]: [CLM  ] Members Left: 

Apr 15 18:41:00 nodea openais[16936]: [CLM  ] Members Joined: 

Apr 15 18:41:00 nodea openais[16936]: [CLM  ]     r(0)
ip(10.192.16.42)  

Apr 15 18:41:00 nodea openais[16936]: [SYNC ] This node is within the
primary component and will provide service. 

Apr 15 18:41:00 nodea openais[16936]: [TOTEM] entering OPERATIONAL
state. 

Apr 15 18:41:00 nodea openais[16936]: [CMAN ] quorum regained, resuming
activity 

Apr 15 18:41:00 nodea openais[16936]: [CLM  ] got nodejoin message
10.192.16.42 

Apr 15 18:42:11 nodea fenced[16955]: nodeb not a cluster member after
25 sec post_join_delay

Apr 15 18:42:11 nodea fenced[16955]: fencing node "nodeb"

Apr 15 18:42:23 nodea fenced[16955]: fence "nodeb" success

[root@nodea ~]#
clustat

Cluster Status for VCluster @ Thu Apr 15 18:55:23 2010

Member Status: Quorate

 Member Name                                                     ID  
Status

 ------ ----                                                     ----
------

 nodea                                                               1
Online, Local

 nodeb                                                               2
Offline

Then when nodeb starts again, I try to start cman there to join the
cluster... but it again fences "nodea":

[root@nodeb ~]#
clustat

Could not connect to CMAN: No such file or directory

[root@nodeb ~]# service cman start

Starting cluster: 

   Loading modules... done

   Mounting configfs... done

   Starting ccsd... done

   Starting cman... done

   Starting qdiskd... done

   Starting daemons... done

   Starting fencing... (wait for 25secs again) done

                                                           [  OK  ]

"nodea" gets rebooted:

[root@nodea ~]# 

Broadcast message from root (Thu Apr 15 18:58:40 2010):

The system is going down for system halt NOW!

Apr 15 18:57:31
nodeb openais[11789]: [CLM  ] Members Joined: 

Apr 15 18:57:31 nodeb openais[11789]: [CLM  ]     r(0)
ip(10.192.16.44)  

Apr 15 18:57:31 nodeb openais[11789]: [SYNC ] This node is within the
primary component and will provide service. 

Apr 15 18:57:31 nodeb openais[11789]: [TOTEM] entering OPERATIONAL
state. 

Apr 15 18:57:31 nodeb openais[11789]: [CMAN ] quorum regained, resuming
activity 

Apr 15 18:57:31 nodeb openais[11789]: [CLM  ] got nodejoin message
10.192.16.44 

Apr 15 18:57:34 nodeb qdiskd[10323]: <info> Quorum Daemon
Initializing 

Apr 15 18:57:34 nodeb qdiskd[10323]: <crit> Initialization failed

Apr 15 18:58:42 nodeb fenced[11816]: nodea not a cluster member after
25 sec post_join_delay

Apr 15 18:58:42 nodeb fenced[11816]: fencing node "nodea"

Apr 15 18:58:54 nodeb fenced[11816]: fence "nodea" success

And I can't get the two nodes, joining the cluster...

I guess I'm missing something at the cluster.conf file??? I can't find
what I'm making wrong.

Thanks for any help!

Alex Re

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster