Re: Fencing problem

Eric Kerin <eric@xxxxxxxxxxx> · Mon, 29 May 2006 09:18:24 -0400

On Mon, 2006-05-29 at 10:41 +0200, Tomasz Koczorowski wrote:
> Hi,
>  
> I have a problem with RHCS 4 in two node configuration (wayne and
> eastwood).
> Service ucpgw is running on wayne and httpd on eastwood.
> Every node is a Sun V40z server, fencing is done by IPMI.
> During cluster tests I unplug both power cables from one server (wayne),
> thus
> simulating unexpected poweroff (IPMI interface is also unavailable while
> server
> is out of power). 
> <SNIP>
> Is this cluster misconfigured or is it a bug in fenced/ccsd subsystem?
> How can I solve this problem?
> 

This is an inherent flaw in using the on-board control devices (ILO,
IPMI, etc) as fence devices.  Since the remaining node(s) can't
successfully fence the failed node, they won't continue.

Fenced also can't assume the machine is already powered down, since it
could be a network problem keeping it from accessing the other node (and
it's IPMI device)

I use two network accessible power controllers for fencing my cluster.
With each power supply hooked up to a different controller, providing
redundant power paths.

Thanks, 
Eric Kerin
eric@xxxxxxxxxxx

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster