Re: really reliable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 14, 2009 at 12:17:44PM -0400, Ryan Golhar wrote:
> Is redhat cluster suite really reliable?  I've been having so much 
> trouble getting a cluster up and running,

Problems getting a cluster up are common, usually come down to network issues,
and are very difficult to diagnose.  The cluster software produces almost
indecipherable errors and strange behaviors when the network isn't behaving as
expected.

My usual suggestion is to disable the cman init script, and just run
"ccsd; cman_tool join" on the nodes.  Then watch the output of
"cman_tool nodes", and "cman_tool status", observing how long it takes
the nodes to recognize each other.  Any delay over a few seconds for
a steady-state cluster membership to form, and you may have some network
problems.

To successfully administer a cluster, you really need to be proficient in
using cman_tool to start up, monitor and shut down the nodes.  The cman init
script does a bunch of things for you, which is great when everything is
working, but when something doesn't work the init script can leave a big
complicated mess that's impossible to sort out.


> I've installed just the bare minimum (before even getting to GFS) to 
> test the cluster software.  Just starting cman cluster services fails on 
> two of the nodes.

That's the right approach, but as mentioned above, you probably need to pare
things down to just using cman_tool if it's network problems at the root.

> Even when I try to reboot the nodes, I can't because the whole system 
> hangs on various processes that don't ever shut down.  I have to 
> physically reboot these boxes.

If something has gone wrong, it's often impossible to shutdown without a hard
reboot.  Even when things are working, rebooting can be a delicate task
because the system may easily be configured to stop things in the wrong order,
and one thing out of place can cause a wreck.

Dave

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux