On Tue, Apr 14, 2009 at 12:17:44PM -0400, Ryan Golhar wrote: > Is redhat cluster suite really reliable? I've been having so much > trouble getting a cluster up and running, Problems getting a cluster up are common, usually come down to network issues, and are very difficult to diagnose. The cluster software produces almost indecipherable errors and strange behaviors when the network isn't behaving as expected. My usual suggestion is to disable the cman init script, and just run "ccsd; cman_tool join" on the nodes. Then watch the output of "cman_tool nodes", and "cman_tool status", observing how long it takes the nodes to recognize each other. Any delay over a few seconds for a steady-state cluster membership to form, and you may have some network problems. To successfully administer a cluster, you really need to be proficient in using cman_tool to start up, monitor and shut down the nodes. The cman init script does a bunch of things for you, which is great when everything is working, but when something doesn't work the init script can leave a big complicated mess that's impossible to sort out. > I've installed just the bare minimum (before even getting to GFS) to > test the cluster software. Just starting cman cluster services fails on > two of the nodes. That's the right approach, but as mentioned above, you probably need to pare things down to just using cman_tool if it's network problems at the root. > Even when I try to reboot the nodes, I can't because the whole system > hangs on various processes that don't ever shut down. I have to > physically reboot these boxes. If something has gone wrong, it's often impossible to shutdown without a hard reboot. Even when things are working, rebooting can be a delicate task because the system may easily be configured to stop things in the wrong order, and one thing out of place can cause a wreck. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster