Hi folks,
I have setup a cluster on 5.2 with system-config-cluster. It is quite
simple: the only service is an ip ressource that is switched.
The cluster has started up fine the first time, the virtual ip was where
ist belonged. Since then I have not changed anything, I simply had to
restart the machines for other reasons.
Now nothing works as it should:
- shutting down clurgmgrd normally (service rgmanager stop) is impossible;
even kill -9 does not work. I have to call "reboot" twice to force a reboot
to stop clurgmgrd.
- after reboot I can manually start the cluster again (did not venture to
do it with system startup), the daemons start, nothing unusual is logged,
but
a) the service containing the ip ressource is not started
b) clustat on the primary node moans a "timed out trying to connect to
Ressource Group Manager"
c) clustat on both nodes shows the node state, but does not list the
service
I have tried everything to get the environement clean (shutdown the
firewall, set selinux to permissive, etc.), but the result is always the
same. Since I did not change anything after the first successfull start of
the cluster, I wonder
- if there is some run time data/temporary files the ressource group
manager writes to disk and tries to reread after reboot (remember, I had to
kill it by violent force to be able to reboot my machines)
- if it is possible at all to successfully run a cluster with cman and
clurgmgrd.
In case it helps here is my cluster.conf:
<?xml version="1.0" ?>
<cluster config_version="5" name="GatewayCluster">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="rtr1hb" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="fence1" nodename="rtr1hb"/>
</method>
</fence>
</clusternode>
<clusternode name="rtr2hb" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="fence2" nodename="rtr2hb"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="fence1"/>
<fencedevice agent="fence_manual" name="fence2"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="Gateway1" ordered="1" restricted="1">
<failoverdomainnode name="rtr1hb" priority="1"/>
<failoverdomainnode name="rtr2hb" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="IP Address" monitor_link="1"/>
</resources>
<service autostart="1" domain="Gateway1" name="Gateway1-IP">
<ip ref="IP Address"/>
</service>
</rm>
</cluster>
The logs show the nodes successfully joining the cluster and such stuff and
as last clurgmgrd starting, then nothing more from cluster daemons.
Any hint or help is appreciated. I am stuck and do not know where to look
at.
Dirk
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster