Re: Arbitrary heuristics

Lon Hohberger <lhh@xxxxxxxxxx> · Mon, 26 Nov 2007 13:44:53 -0500

On Tue, 2007-11-20 at 14:06 -0800, Scott Becker wrote:
> I've been pondering what I'm actually looking for.
> 
> Each of my nodes has a public and a private NIC. Public is for serving 
> web pages, private is for fencing. I was desperately trying to get 
> fencing to work over the public network but I was faced with 
> reimplementing a complicated fence agent in C in order to use ssh 
> (supported ok by my power switches but difficult to add to the python 
> fence agent).
> 
> My remaining issue is that if I lose one of my public NICs, I must 
> ensure that the ensuing fencing race is won by the good node and not the 
> bad node which thinks it's good. Not solved by quorum because I must 
> also make it work, 'last man standing' (starting with 3 nodes).
> 
> So pondering, I realized that I don't really need to monitor the ability 
> to reach the gateway. What I need is for a public comm error to create 
> an event, hence I use the public nic for cluster comms. Then do 
> something so that the bad node doesn't fence the good nodes.
> 
> So assuming only one real failure at a time, I'm thinking of making the 
> first step in the fencing method a check for pinging the gateway. That 
> way when a node wants to fence, it will only be able to if it's public 
> NIC is working, even though it's using the private nic for the rest of 
> the fencing.

That's a pretty good + simple idea.

-- Lon

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster