Re: I give up

James Chamberlain <jamesc@xxxxxxx> · Wed, 28 Nov 2007 18:54:33 -0500 (EST)

On Wed, 28 Nov 2007, Scott Becker wrote:

 I do want to disagree strongly, however, with your blanket suggestion that
 this software is not complete, and is not a cluster solution. It is a
 solution for many, many users...not all of whom are RH customers. It is
 just not a solution for you, my friend.

 Thanks for your many constructive comments. I hope you keep trying the
 software - we are here to help as best we can. I haven't given up on you
 *quite* yet! :)

to recap my problem, from an earlier post:

I just performed a test which failed miserably. I have two nodes (node 2 and 
node 3) and did a test of a nic failure and expected a fencing race with a 
good outcome. The good node did not attempt to fence the bad node (although 
the bad one did make an attempt as expected). At the same time it also did 
not take over the service ( really bad ).
...

The only time I've had an issue like this was when I had my fencing and 
failover domain incorrectly configured.  I had set up my fencedevices, but 
hadn't associated them with any of my clusternodes.  Any chance you can 
post your cluster.conf, sans passwords?  (Assuming you haven't already 
posted it - I didn't find it in a quick search)

Only after I reconnected the nic's cable did it reject the improperly joining 
node (good) and recover the service (too late). Normal luci two node 
configuration. It's broken. From a prior post of "Service Recovery Failure" 
thread:

How do I turn up the verbosity of fenced? I'll repeat the test. The only 
mention I can find is -D but I don't know how I can use that. I'll browse the 
source and see if I can learn anything. I'm using 2.0.73.
...

The failover failed. My fence_apc hack worked great. If I could turn up 
verbosity of fenced I would continue trying to figure this out. It's possible 
that if I stopped the cluster, rebooted everything and then brought the 
cluster back up then my test might succeed. But I still can't trust it for 
production.

Several others have said 4.5 and 4.6 work great for them but 5.0 and 5.1 
malfunction.

The only problem I've had with 5.0 was getting it initially set up.  Once I 
applied the available patches (pre-5.1), everything went smoothly for me. 
But then, I wasn't trying to use a quorum disk, which I've heard can be 
tricky to set up correctly, particularly if you're using clvm.  The only 
time I've seen quorum disks work correctly was way back with RHAS 2.1 - but 
I also haven't tried setting that up since then.

Regards,

James Chamberlain

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster