On Tue, 2006-12-12 at 10:10 -0700, Daryl Fenton wrote: > Right now we have 2 HP blade servers (Blade1 and Blade3) running redhat > AS 4U4 and cluster suite 4, they are both accessing LVMs on our EMC > CX700 SAN. Presently we have a 350Gig Ext3 LVM and a 350Gig GFS LVM that > they are trying to share using cluster suite and NFS. The following > issue is when we are running tests on our Ext3 NFS share. When we take > down one of the HBA connections to Blade1 the multipath kicks in and > everything works fine, but when we disable all of the HBA connections on > Blade1 the quorum then notices that Blade1 can’t access the qdisk and > the cluster then fences blade1 which causes it to reboot it’s self. The > problem is when blade1 comes back up, it can’t find it’s quorum disk > since the hba is down. Since you need cman for the quorum to work cman > fires up fine and blade1 joins the cluster. The next service to start is > qdsikd which fails since blade1’s hba is down and it can’t see the > quorum disk. Once everything is started blade1 tries to get it’s > services back from the cluster and fails them since it’s hba is down. > And then just sites there in the failed state until manual intervention. > Is there a way to get blade1 not to join the cluster since it’s hba is > still down, or if it does join the cluster tell it to fence it’s self / > not accept any services? There's a bug open about this; we're still trying to figure out the best way to handle it without breaking backward compatibility. I would expect a (testable) fix to be constructed this week. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=216092 -- Lon
Attachment:
signature.asc
Description: This is a digitally signed message part
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster