On Tuesday 12 June 2007 03:29:00 Manish Kathuria wrote: > On 6/11/07, Robert Gil <Robert.Gil@xxxxxxxxxxxxxx> wrote: > > If ilo itself is off, fencing doesn't work. > > Isn't there any timeout setting such that if the ILO doesn't respond > for a certain amount of time, it is treated as fenced and the node is > considered to be dead and the failover takes place? As far as I remember there is only a tcp-timeout when establishing the connection to the ilo-card that takes a very long time to occure (that's a default setting and takes minutes). I'm not sure how and where to set it. But we've had this discussion (especially with ILO-Cards) nearly every time when using them and therefore and also out of other reasons we had to build our own fence_ilo agent. I'm quite sure that we solved the timeout problem in the end. It is set to 10sec per default (Config.timeout). You can find it at http://download.atix.de/yum/comoonics/productive/noarch/RPMS/comoonics-bootimage-fenceclient-ilo-0.1-16.noarch.rpm or directly use the yum/up2date-channel as described here: http://www.open-sharedroot.org/faq/can-i-use-yum-or-up2date-to-install-the-software/ then install "comoonics-bootimage-fenceclient-ilo" and there you go. > > > Did you add ilo as a fence device? And create a user? You create a user > > in the ilo for that blade, not on the chassis. You have to reboot the > > blade to get to the ilo manager. > > Yes, had added respective ILOs as fence devices for both the servers > and created users also. We are doing so as well. Always a power user for ilo devices. We are also automating this with the ilo client. There is a undocumented switch -x in the fence_ilo client referenced above where you reference a file that might look as follows and you'll have your user. <USER_INFO MODE="write"> <ADD_USER USER_NAME="power" USER_LOGIN="power" PASSWORD="the_password"> <ADMIN_PRIV value ="N"/> <REMOTE_CONS_PRIV value ="N"/> <RESET_SERVER_PRIV value ="Y"/> <VIRTUAL_MEDIA_PRIV value ="N"/> <!-- Firmware support infomation for next tag: --> <!-- iLO 2 - All version. --> <!-- iLO - All version. --> <!-- RILOE II - None --> <CONFIG_ILO_PRIV value="Yes"/> <!-- Firmware support infomation for next 3 tags: --> <!-- iLO 2 - None. --> <!-- iLO - None. --> <!-- RILOE II - All versions. --> <!-- <CONFIG_RILO_PRIV value="Y"/> <LOGIN_PRIV value ="Y"/> <CLIENT_RANGE value ="10.10.10.1 - 254.255.255.255"/> --> <!-- Firmware support infomation for next 6 tags: --> <!-- iLO 2 - None. --> <!-- iLO - Version 1.40 and earlier. --> <!-- RILOE II - None. --> <!-- <VIEW_LOGS_PRIV value="Yes"/> <CLEAR_LOGS_PRIV value="Yes"/> <EMS_PRIV value="Yes"/> <UPDATE_ILO_PRIV value="No"/> <CONFIG_RACK_PRIV value="Yes"/> <DIAG_PRIV value="Yes"/> --> </ADD_USER> </USER_INFO> > > > I just want to make sure that automatic fencing happens and failover > takes place even when there is a complete power failure for one node If the timeout thing works you'll also need a second fence mechanism. You might think about using fence_manual as last resort, to bring that cluster back online after power failure and then after manual intervention. Regards Marc. > > > -----Original Message----- > > From: linux-cluster-bounces@xxxxxxxxxx > > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Manish Kathuria > > Sent: Monday, June 11, 2007 12:45 PM > > To: linux clustering > > Subject: Re: Problems with Cluster > > > > On 6/11/07, Maciej Bogucki <maciej.bogucki@xxxxxxxxxxxxx> wrote: > > > Manish Kathuria napisał(a): > > > > We want the failover to happen when the power supply fails to either > > > > of the nodes. In order to test the scenario, we removed the power > > > > cables from one of the nodes. However the failover did not happen > > > > and upon observing the logs we found that the alive node could not > > > > connect to the fence device (ILO in this case) of the dead node > > > > since it was powered off and the fencing could not take place. Does > > > > this mean that we would not be able to have a failover in case of > > > > power failure for one of the nodes. Is there a way we can do it ? > > > > How is the cluster supposed to react when the ILO itself is powered > > > > off ? > > > > > > You need to perform manual fencing(administrator reaction) when it > > > happend. > > > > Isn't there any way which is automated and does not require manual > > intervention ? Otherwise, the whole purpose gets defeated. > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Gruss / Regards, Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX - Ges. fuer Informationstechnologie und Consulting mbH Einsteinstr. 10 - 85716 Unterschleissheim - Germany Registergericht: Amtsgericht München Registernummer: HRB 131682 USt.-Id.: DE209485962 Geschäftsführung: Marc Grimme, Mark Hlawatschek, Thomas Merz -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster