As mentioned the version of the ilo firmware caused some issues for cluster admins because additional features/commands were incorporated. This topic was discussed at the Red Hat Summit and a single command of "COLD_BOOT_SERVER" would perform a power off/wait 4 seconds/cold boot the server. This directive was suggested as a replacement for the "HOLD_PWR_BTN" directive in the scripts Greg Caetano Hewlett-Packard Company ESS Software Platform & Business Enablement Solutions Engineering Chicago, IL greg.caetano@xxxxxx Red Hat Certified Engineer RHCE#805007310328754 -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Kevin Anderson Sent: Wednesday, October 15, 2008 4:42 PM To: linux clustering Subject: RE: RE: Fencing quandry On Wed, 2008-10-15 at 17:38 -0400, jim parsons wrote: > On Wed, 2008-10-15 at 20:45 +0000, Hofmeister, James (WTEC Linux) wrote: > > Hello Jeff, > > > > RE: RE: Fencing quandary > > > > The root issue is the ILO scripts are not up to date with the current firmware rev in the c-class and p-class blades. > > > > The method of '<device name="ilo01"/>' for a "reboot" is not working with this ILO firmware rev and the workaround is to send 2 commands to ILO under a single method... 'action="off"/' and 'action="on"/'. > > > > I had tested this with my p-class blades and it was successful. I am still waiting for my customers test results on their c-class blades. > > > > ...yes this is the root issue to the ILO problem, but it does not completely address your concern. I believe you are saying: That the RHCS does not accept a "power off" as a fence, but is requiring both "power off" followed by "power on". > Right. It is failing because the 'power on' portion is not completing > because the fence agent is unable to send the correct power on command. > But the point is, even if the power on command fails, the fencing agent should report success, since the real need is to ensure the machine is no longer participating in the cluster and not bring it back up. So, is it proper to report success if part of the request fails as long as the critical part succeeds? Kevin -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster