On Tue, Mar 01, 2011 at 06:50:18PM +0530, Parvez Shaikh wrote: > Hi Ryan, > > Thank you for response. Does it mean there is no way to intimate > administrator about failure of fencing as of now? > > Let me give more information about my cluster - > > I have set of nodes in cluster with only IP resource being protected. I have > two levels of fencing, first bladecenter fencing and second one is manual > fencing. If the problem you have with fence_bladecenter is intermittent - for example, if it fails 1/2 the time, fence_manual is going to *detract* from your cluster's ability to recover automatically. Ordinarily, if a fencing action fails, fenced will automatically retry the operation. When you configure fence_manual as a backup, this retry will *never* occur, meaning your cluster hangs. > At times if machine is already down(either power failure or turned off > abrupty); blade center fencing timesout and manual fencing happens. At this > time, administrator is expected to run fence_ack_manual. > Clearly this is not something which is desirable, as downtime of services is > as long as administrator runs fence_ack_manual. > What is recommended method to deal with blade center fencing failure in > this situation? Do I have to add another level of fencing(between blade > center and manual) which can fence automatically(not requiring manual > interference)? Start with removing fence_manual. If fencing is failing (permanently), you can still run: fence_ack_manual -e -n <nodename> > > > my bladecenter fencing agent, I sometimes get message saying bladecenter > > > fencing failed because of timeout or fence device IP address/user > > > credentials are incorrect. ^^ This is why I think fence_manual is, in your specific case, very likely hurting your availability. -- Lon Hohberger - Red Hat, Inc. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster