Thank you for reply.
What I gathered from your response is to remove manual fencing at once. This will cause fence daemon to retry fence_bladecenter until the node is fenced. More likely the fenced will succeed in fencing the failed node(provided IP, user name and password for bladecenter management module are right); even if it times out for the first time. Am I right?
I will try removing manual fencing and see how things go.
>> If fencing is failing (permanently), you can still run:
>> fence_ack_manual -e -n <nodename>
By the way as per my understanding fence_ack_manual -n <node name> can be executed to acknowledge only manually fenced node(and not bladecenter fenced node), correct me if this understanding is wrong. So God forbid, if fence_bladecenter fails for some reason; we still have option to run fence_manual and then fence_ack_manual, so cluster is back to working.
Thanks again and have great weekend ahead
Yours truly,
Parvez
On Fri, Mar 4, 2011 at 10:45 PM, Lon Hohberger <lhh@xxxxxxxxxx> wrote:
On Tue, Mar 01, 2011 at 06:50:18PM +0530, Parvez Shaikh wrote:If the problem you have with fence_bladecenter is intermittent - for
> Hi Ryan,
>
> Thank you for response. Does it mean there is no way to intimate
> administrator about failure of fencing as of now?
>
> Let me give more information about my cluster -
>
> I have set of nodes in cluster with only IP resource being protected. I have
> two levels of fencing, first bladecenter fencing and second one is manual
> fencing.
example, if it fails 1/2 the time, fence_manual is going to *detract*
from your cluster's ability to recover automatically.
Ordinarily, if a fencing action fails, fenced will automatically retry
the operation.
When you configure fence_manual as a backup, this retry will *never*
occur, meaning your cluster hangs.
Start with removing fence_manual.
> At times if machine is already down(either power failure or turned off
> abrupty); blade center fencing timesout and manual fencing happens. At this
> time, administrator is expected to run fence_ack_manual.
> Clearly this is not something which is desirable, as downtime of services is
> as long as administrator runs fence_ack_manual.
> What is recommended method to deal with blade center fencing failure in
> this situation? Do I have to add another level of fencing(between blade
> center and manual) which can fence automatically(not requiring manual
> interference)?
If fencing is failing (permanently), you can still run:
fence_ack_manual -e -n <nodename>
^^ This is why I think fence_manual is, in your specific case, very
> > > my bladecenter fencing agent, I sometimes get message saying bladecenter
> > > fencing failed because of timeout or fence device IP address/user
> > > credentials are incorrect.
likely hurting your availability.
--
Lon Hohberger - Red Hat, Inc.
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster