Right now we have 2 HP blade servers (Blade1 and Blade3) running redhat
AS 4U4 and cluster suite 4, they are both accessing LVMs on our EMC
CX700 SAN. Presently we have a 350Gig Ext3 LVM and a 350Gig GFS LVM that
they are trying to share using cluster suite and NFS. The following
issue is when we are running tests on our Ext3 NFS share. When we take
down one of the HBA connections to Blade1 the multipath kicks in and
everything works fine, but when we disable all of the HBA connections on
Blade1 the quorum then notices that Blade1 can’t access the qdisk and
the cluster then fences blade1 which causes it to reboot it’s self. The
problem is when blade1 comes back up, it can’t find it’s quorum disk
since the hba is down. Since you need cman for the quorum to work cman
fires up fine and blade1 joins the cluster. The next service to start is
qdsikd which fails since blade1’s hba is down and it can’t see the
quorum disk. Once everything is started blade1 tries to get it’s
services back from the cluster and fails them since it’s hba is down.
And then just sites there in the failed state until manual intervention.
Is there a way to get blade1 not to join the cluster since it’s hba is
still down, or if it does join the cluster tell it to fence it’s self /
not accept any services?
Thanks,
Daryl Fenton
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster