What does 'rpm -q cman' return? This looks very odd; <fencedevice agent="fence_bladecenter" > ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>" Please remove this for now; <fence_daemon clean_start="1" post_fail_delay="0" > post_join_delay="0"/> In general, you don't want to assume a clean start. It's asking for trouble. The default delays are also sane. You can always come back to this later after this issue is resolved, if you wish. On 10/30/2012 09:20 PM, Parvez Shaikh wrote: > Hi Digimer, > > cman_tool version gives following - > > 6.2.0 config 22 > > Cluster.conf - > > <?xml version="1.0"?> > <cluster alias="PARVEZ" config_version="22" name="PARVEZ"> > <clusternodes> > <clusternode name="myblade2" nodeid="2" votes="1"> > <fence> > <method name="1"> > <device blade="2" > missing_as_off="1" name="BladeCenterFencing-1"/> > </method> > </fence> > </clusternode> > <clusternode name="myblade1" nodeid="1" votes="1"> > <fence> > <method name="1"> > <device blade="1" > missing_as_off="1" name="BladeCenterFencing-1"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman expected_votes="1" two_node="1"/> > <fencedevices> > <fencedevice agent="fence_bladecenter" > ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>" login="XXXX" > name="BladeCenterFencing-1" passwd="XXXXX" shell_timeout="10"/> > </fencedevices> > <rm> > <resources> > <script file="/localhome/my/my_ha" > name="myHaAgent"/> > <ip address="192.168.51.51" monitor_link="1"/> > </resources> > <failoverdomains> > <failoverdomain name="mydomain" nofailback="1" > ordered="1" restricted="1"> > <failoverdomainnode name="myblade2" > priority="2"/> > <failoverdomainnode name="myblade1" > priority="1"/> > </failoverdomain> > </failoverdomains> > <service autostart="0" domain="mydomain" exclusive="0" > max_restarts="5" name="mgmt" recovery="restart"> > <script ref="myHaAgent"/> > <ip ref="192.168.51.51"/> > </service> > </rm> > <fence_daemon clean_start="1" post_fail_delay="0" > post_join_delay="0"/> > </cluster> > > Thanks, > Parvez > > On Tue, Oct 30, 2012 at 9:25 PM, Digimer <lists@xxxxxxxxxx > <mailto:lists@xxxxxxxxxx>> wrote: > > On 10/30/2012 01:54 AM, Parvez Shaikh wrote: > > Hi experts, > > > > I have defined a service as follows in cluster.conf - > > > > <service autostart="0" domain="mydomain" exclusive="0" > > max_restarts="5" name="mgmt" recovery="restart"> > > <script ref="myHaAgent"/> > > <ip ref="192.168.51.51"/> > > </service> > > > > I mentioned max_restarts=5 hoping that if cluster fails to start > service > > 5 times, then it will relocate to another cluster node in failover > domain. > > > > To check this, I turned down NIC hosting service's floating IP and got > > following logs - > > > > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not > > detected > > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1... > > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1... > > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip > > "192.168.51.51" returned 1 (generic error) > > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service > > service:mgmt > > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service > service:mgmt is > > recovering* > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed > > service service:mgmt > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip > > "192.168.51.51" returned 1 (generic error) > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start > > service:mgmt; return value: 1 > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service > > service:mgmt > > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service > service:mgmt is > > recovering > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating > failed > > service service:mgmt* > > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service > service:mgmt is > > stopped > > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service > service:mgmt is > > stopped > > > > But from the log it appears that cluster tried to restart service only > > ONCE before relocating. > > > > I was expecting cluster to retry starting this service five times > on the > > same node before relocating > > > > Can anybody correct my understanding? > > > > Thanks, > > Parvez > > What version? Please paste your full cluster.conf. > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster