On Mon, 2006-11-13 at 13:04 -0500, Patel, Tushar wrote: > > Hello, > > We have lot of AS 4.0 GFS 6.0 clusters in our firm working fine. > However recently we are upgrading our clusters to GFS 6.1 > > From what we have seen so far is once we upgraded to GFS 6.1 our > failover test is failing. > > So far we have conducted failover tests using 2 scenario > 1.) Manually reboot one of the node in cluster using HP-iLO interface > sending fence_ilo command from one of the nodes - the command works > fine and host does reboot. > 2.) Pulling out network cables of one of the host in the cluster. > > We have 4 node cluster. > > Problem is with either of the above tests, gfs hangs and clustat > starts reporting only member information. No service information. > Clustat displays following message: > "Timeout : Resource Manager not responding" > > The gfs just keeps hanging for untill we manually intervene and bring > the halted node up. > > It seems fencing is not working or rgmanager is flawed and cannot > process/parse information. If fencing breaks, rgmanager will hang... Look at /proc/cluster/services -- if you see the fence domain in the 'recover' state, fencing needs to be fixed. -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster