Maybe you missing recovery="restart" in your services
2012/10/31 Parvez Shaikh <parvez.h.shaikh@xxxxxxxxx>
Hi Digimer,
cman_tool version gives following -
6.2.0 config 22
Cluster.conf -
<?xml version="1.0"?>
<cluster alias="PARVEZ" config_version="22" name="PARVEZ">
<clusternodes>
<clusternode name="myblade2" nodeid="2" votes="1">
<fence>
<method name="1">
<device blade="2" missing_as_off="1" name="BladeCenterFencing-1"/>
</method>
</fence>
</clusternode>
<clusternode name="myblade1" nodeid="1" votes="1">
<fence>
<method name="1">
<device blade="1" missing_as_off="1" name="BladeCenterFencing-1"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_bladecenter" ipaddr="mm-1.mydomain.com" login="XXXX" name="BladeCenterFencing-1" passwd="XXXXX" shell_timeout="10"/>
</fencedevices>
<rm>
<resources>
<script file="/localhome/my/my_ha" name="myHaAgent"/>
<ip address="192.168.51.51" monitor_link="1"/>
</resources>
<failoverdomains>
<failoverdomain name="mydomain" nofailback="1" ordered="1" restricted="1">
<failoverdomainnode name="myblade2" priority="2"/>
<failoverdomainnode name="myblade1" priority="1"/>
</failoverdomain>
</failoverdomains>
<service autostart="0" domain="mydomain" exclusive="0" max_restarts="5" name="mgmt" recovery="restart">
<script ref="myHaAgent"/>
<ip ref="192.168.51.51"/>
</service>
</rm>
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="0"/>
</cluster>
Thanks,
ParvezOn Tue, Oct 30, 2012 at 9:25 PM, Digimer <lists@xxxxxxxxxx> wrote:
On 10/30/2012 01:54 AM, Parvez Shaikh wrote:> *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> Hi experts,
>
> I have defined a service as follows in cluster.conf -
>
> <service autostart="0" domain="mydomain" exclusive="0"
> max_restarts="5" name="mgmt" recovery="restart">
> <script ref="myHaAgent"/>
> <ip ref="192.168.51.51"/>
> </service>
>
> I mentioned max_restarts=5 hoping that if cluster fails to start service
> 5 times, then it will relocate to another cluster node in failover domain.
>
> To check this, I turned down NIC hosting service's floating IP and got
> following logs -
>
> Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
> detected
> Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
> "192.168.51.51" returned 1 (generic error)
> Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
> service:mgmt
> recovering*
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed> *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> service service:mgmt
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip
> "192.168.51.51" returned 1 (generic error)
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
> service:mgmt; return value: 1
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
> service:mgmt
> recovering> service service:mgmt*
> Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed
> Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt isWhat version? Please paste your full cluster.conf.
> stopped
> Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
> stopped
>
> But from the log it appears that cluster tried to restart service only
> ONCE before relocating.
>
> I was expecting cluster to retry starting this service five times on the
> same node before relocating
>
> Can anybody correct my understanding?
>
> Thanks,
> Parvez
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
esta es mi vida e me la vivo hasta que dios quiera
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster