Re: Cluster Suite 4 failover problem

Dicky <dicky_nnc@xxxxxxxxxxxx> · Fri, 20 Oct 2006 10:10:47 +0800

Hi,

Thx for the reply. :)

Yes, i have installed the 'fence' rpm, and others according to the Redhat Cluster Suite documenation's "RPM Selection Criteria: Red Hat Cluster Suite with DLM"
, following are the rpms i have installed:

=====RPM Installed=====

ccs, fence, gulm, iddev, magma, magma-plugins, perl-Net-Telnet,	system-config-cluster, ipvsadm,
piranha, ccs-devel, gulm-devel, iddev-devel, magma-devel,

====END=======

I didn't install GFS.

Here is the /var/log/messages output when i try to restart the rgmanager service from the failed node after i re-enable eth0:

===/var/log/messages ==

rgmanager: [1074]: <notice> Shutting down Cluster Service Manager...
clurgmgrd[31777]: <err> #50: Unable to obtain cluster lock: Connection timed out
clurgmgrd[31777]: <err> #50: Unable to obtain cluster lock: Connection timed out
clurgmgrd[31777]: <warning> #67: Shutting down uncleanly
clurgmgrd: [31777]: <info> Executing /etc/rc.d/init.d/vsftpd stop
clurgmgrd: [31777]: <info> Executing /etc/rc.d/init.d/httpd stop
vsftpd: vsftpd shutdown succeeded
clurgmgrd: [31777]: <info> Removing IPv4 address 192.168.0.112 from eth0
httpd: httpd shutdown succeeded
clurgmgrd: [31777]: <info> Removing IPv4 address 192.168.0.111 from eth0

=======END============

Then it hanged forver until i manually reset the machine.

I would like to know if the waiting is caused by this line :"
clurgmgrd[31777]: <err> #50: Unable to obtain cluster lock: Connection timed out
" ?? If so, why and how to solve it??

Also, i would like to know even i type " reboot" , it also hanged in this line: "Shutting down Cluster Service Manager...
Waiting for services to stop: " which caused me have press the reset button, which may caused the file system corrupted, so manually press the reset button is dangerous.
Is there anyway for me to shutdown the rgmanager properly?

Second question is, why the cluster didn't failover but the status showed that the services were "started" ??? Is there anything i missed in the configuration process??

Many thanks,
Dicky

Hi,

What is output to the "/var/log/messages" files of
each node? That 
should provide a clue as to what the problem is. 
Also, did you install 
the 'fence' RPM and any Clustered LVM / GFS RPMs?

You also might consider rebooting the "downed" node
- this function is 
generally taken care of by fencing devices
automatically and, as I 
understand it, "manual fencing" means you gotta
reboot :), the 
assumption being that a failed node won't be allowed
back in the cluster 
until it's restarted.

Thanks,
Jon

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster