Re: clurgmgrd : <notice> relocating a service to better node

Digimer <lists@xxxxxxxxxx> · Wed, 11 Apr 2012 02:42:07 -0400

Still shy on info. I shall assume that you are trying to start 'my_proc'
on 'my_blade2.my_domain' and it moves to 'my_blade1.my_domain'? What do
the syslogs say when you start the resource? What is the output of
'cman_tool status'.

Please, provide all data you think might be relevant. It's easier to
ignore useless data than it is to guess about important but missing data.

A few additional comments:

* When sharing configs, please leave everything as is, except passwords.
Trying to obfuscate specific-but-harmless issues, you may remove the problem

* clean_start="1" is not wise. It's telling the node to assume that the
peer is dead when it starts but can't talk to it's peer. This is asking
for a split-brain.

* missing_as_off="1" is also dangerous, as it makes an assumption about
a node's state, rather than verifying it.

* You have three IP resources defined, but only one in use.

Digimer

On 04/11/2012 02:31 AM, Parvez Shaikh wrote:
> Hi Digimer,
> 
> 
> cman_tool version
> 6.2.0 config 3
> 
> RPM versions -
> 
> cman-2.0.115-34.el5
> rgmanager-2.0.52-6.el5
> 
> I am on RHEL 5.5
> 
> The configuration is like this -
> 
> Cluster of 2 nodes. Each node is IBM Blade hosted in chassis. Private
> network within chassis is used for heartbeat across cluster nodes and
> other cluster service consist of IP resource and my own server which
> listens on this IP resource.
> 
> cluster.conf file -
> 
>     <?xml version="1.0"?>
>     <cluster alias="PCluster" config_version="3" name="PCluster">
>       <clusternodes>
>         <clusternode name="my_blade2.my_domain" nodeid="2" votes="1">
>           <fence>
>             <method name="1">
>               <device blade="2" missing_as_off="1"
>     name="BladeCenterFencing"/>
>             </method>
>           </fence>
>         </clusternode>
>         <clusternode name="my_blade1.my_domain" nodeid="1" votes="1">
>           <fence>
>             <method name="1">
>               <device blade="1" missing_as_off="1"
>     name="BladeCenterFencing"/>
>             </method>
>           </fence>
>         </clusternode>
>       </clusternodes>
>       <cman expected_votes="1" two_node="1"/>
>       <fencedevices>
>         <fencedevice agent="fence_bladecenter" ipaddr="XXXXX"
>     login="USERID" name="BladeCenterFencing" passwd="XXXXX"/>
>       </fencedevices>
>       <rm>
>         <resources>
>           <script file="/localhome/parvez/my_ha" name="my_HaAgent"/>
>           <ip address="192.168.11.171" monitor_link="1"/>
>           <ip address="192.168.11.175" monitor_link="1"/>
>           <ip address="192.168.11.176" monitor_link="1"/>
>         </resources>
>         <failoverdomains>
>           <failoverdomain name="my_domain" nofailback="1" ordered="1"
>     restricted="1">
>             <failoverdomainnode name="my_blade2.my_domain" priority="2"/>
>             <failoverdomainnode name="my_blade1.my_domain" priority="1"/>
>           </failoverdomain>
>         </failoverdomains>
>         <service autostart="0" domain="my_domain" name="my_proc"
>     recovery="relocate">
>           <script ref="my_HaAgent"/>
>           <ip ref="192.168.11.175"/>
>         </service>
>       </rm>
>       <fence_daemon clean_start="1" post_fail_delay="0"
>     post_join_delay="0"/>
>     </cluster>
> 
> 
> On Wed, Apr 11, 2012 at 11:51 AM, Digimer <lists@xxxxxxxxxx
> <mailto:lists@xxxxxxxxxx>> wrote:
> 
>     On 04/11/2012 02:14 AM, Parvez Shaikh wrote:
>     > Hi,
>     >
>     > When I start or enable a service (that was previously disabled) on a a
>     > cluster node, I see message saying clurmgrd relocating service to
>     > "better" node.
>     >
>     > I am not understanding why. I can relocate service back to a node
>     where
>     > I see above message and it runs fine there.
>     >
>     > What does "better" node could mean? Better in what sense as
>     hardware and
>     > software configurations of both cluster nodes is same. What situation
>     > could possibly trigger this?
>     >
>     > Thanks,
>     > Parvez
> 
>     What version of the cluster software are you using? What is the
>     configuration? To get help, you need to share more details. :)
> 
>     --
>     Digimer
>     Papers and Projects: https://alteeve.com
> 
> 

-- 
Digimer
Papers and Projects: https://alteeve.com

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster