What if the fence device doesn't work?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

I started wondering what happens if my fence device is broken. The
scenario:

 -a node (running a service) fails
 -another node notices the lost heartbeats and tries to fence the failed
 node
 -however, the fence device doesn't respond
 -...what now?

I tried to simulate the situation with our test cluster of two HP Blade
servers, using iLO fencing, by misconfiguring the fencing agent to use a
wrong username to authenticate to the iLO. What happens is, the fenced
on the running node tries to fence the failed node over and over again,
and the service I'm trying to fail over will never leave state "Started"
on node "Unknown"... that is, the cluster won't fail it over to the
running node.

Not good. If the active node fails, and the fence device fails at the
same time - for example, if the active node is a Xen guest and the host
Xen fails, or if the active node loses power because the network power
switch fails or because the iLO gets confused - the service is lost.
The Xen scenario doesn't even seem too far-fetched...

Am I missing something?


--Janne Peltonen
Univ. of Helsinki
mail admin

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux