Re: Error while manual fencing and output of clustat

Parvez Shaikh <parvez.h.shaikh@xxxxxxxxx> · Tue, 11 Jan 2011 11:15:36 +0530

Thanks Xaviar.

It resolved the error on fencing.

However I still am grappling with issue of finding name of "Failed
cluster node" on another cluster node to which service on failed node
has failed over to.

I was using output of "clustat -x -S service name" and was parsing XML
file to obtain value of "last_owner" field.

Any input on how to find out name of failed node on another cluster
node, over which services from failed node are starting?

Thanks

On Mon, Jan 10, 2011 at 6:58 PM, Xavier Montagutelli
<xavier.montagutelli@xxxxxxxxx> wrote:
> Hello Parvez,
>
> On Monday 10 January 2011 09:51:14 Parvez Shaikh wrote:
>> Dear experts,
>>
>> I have two node cluster(node1 and node2), and manual fencing is
>> configured. Service S2 is running on node2. To ensure failover happen,
>> I shutdown node2.. I see following messages in /var/log/messages -
>>
>>                     agent "fence_manual" reports: failed: fence_manual
>> no node name
>
> I am not an expert, but could you show us your cluster.conf file ?
>
> You need to give a "nodename" attribute to the fence_manual agent somewhere,
> the error message makes me think it's missing.
>
> For example :
>
>        <fencedevices>
>                <fencedevice agent="fence_manual" name="my_fence_manual"/>
>        </fencedevices>
> ...
> <clusternode name="node2" ...>
>  <fence>
>     <method name="1">
>         <device name="my_fence_manual" nodename="node2"/>
>       </method>
>   </fence>
> </clusternode>
>
>>
>> fence_ack_manual -n node2 doesn't work saying there is no FIFO in
>> /tmp. fence_ack_manual -n node2 -e do work and then service S2 fails
>> over to node2.
>>
>> Trying to find out why fence_manual is reporting error? node2 is
>> pingable hostname and its entry is in /etc/hosts of node1 (and vice
>> versa).  I also see that after failover when I do "clustat -x" I get
>> cluster status (in XML format) with -
>>
>> <?xml version="1.0"?>
>> <clustat version="4.1.1">
>>   <groups>
>>     <group name="service:S" state="111" state_str="starting" flags="0"
>> flags_str="" owner="node1" last_owner="node1" restarts="0"
>> last_transition="1294676678" last_transition_str="xxxxxxxxxx"/>
>>   </groups>
>> </clustat>
>>
>> I was expecting last_owner would correspond to node2(because this is
>> node which was running service S and has failed); which would indicate
>> that service is failing over FROM node2. Is there a way that node in
>> cluster (a node on which service is failing over) could determine from
>> which node the given service is failing over?
>>
>> Any inputs would be greatly appreciated.
>>
>> Thanks
>>
>> Yours gratefully
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Xavier Montagutelli                      Tel : +33 (0)5 55 45 77 20
> Service Commun Informatique              Fax : +33 (0)5 55 45 75 95
> Universite de Limoges
> 123, avenue Albert Thomas
> 87060 Limoges cedex
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster