Out put of rpm -q cman
cman-2.0.115-34.el5
There is no http mentioned in fencedevice, I think email client is inserting it.
Thanks,
Parvez
On Wed, Oct 31, 2012 at 10:14 AM, Digimer <lists@xxxxxxxxxx> wrote:
What does 'rpm -q cman' return?
This looks very odd;
<fencedevice agent="fence_bladecenter"
> ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>"
Please remove this for now;
In general, you don't want to assume a clean start. It's asking for
<fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="0"/>
trouble. The default delays are also sane. You can always come back to
this later after this issue is resolved, if you wish.
> ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>" login="XXXX"
On 10/30/2012 09:20 PM, Parvez Shaikh wrote:
> Hi Digimer,
>
> cman_tool version gives following -
>
> 6.2.0 config 22
>
> Cluster.conf -
>
> <?xml version="1.0"?>
> <cluster alias="PARVEZ" config_version="22" name="PARVEZ">
> <clusternodes>
> <clusternode name="myblade2" nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device blade="2"
> missing_as_off="1" name="BladeCenterFencing-1"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="myblade1" nodeid="1" votes="1">
> <fence>
> <method name="1">
> <device blade="1"
> missing_as_off="1" name="BladeCenterFencing-1"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_bladecenter"
> name="BladeCenterFencing-1" passwd="XXXXX" shell_timeout="10"/>
> </fencedevices>
> <rm>
> <resources>
> <script file="/localhome/my/my_ha"
> name="myHaAgent"/>
> <ip address="192.168.51.51" monitor_link="1"/>
> </resources>
> <failoverdomains>
> <failoverdomain name="mydomain" nofailback="1"
> ordered="1" restricted="1">
> <failoverdomainnode name="myblade2"
> priority="2"/>
> <failoverdomainnode name="myblade1"
> priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <service autostart="0" domain="mydomain" exclusive="0"
> max_restarts="5" name="mgmt" recovery="restart">
> <script ref="myHaAgent"/>
> <ip ref="192.168.51.51"/>
> </service>
> </rm>
> <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="0"/>
> </cluster>
>
> Thanks,
> Parvez
>
> On Tue, Oct 30, 2012 at 9:25 PM, Digimer <lists@xxxxxxxxxx
> <mailto:lists@xxxxxxxxxx>> wrote:
>
> On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
> > Hi experts,
> >
> > I have defined a service as follows in cluster.conf -
> >
> > <service autostart="0" domain="mydomain" exclusive="0"
> > max_restarts="5" name="mgmt" recovery="restart">
> > <script ref="myHaAgent"/>
> > <ip ref="192.168.51.51"/>
> > </service>
> >
> > I mentioned max_restarts=5 hoping that if cluster fails to start
> service
> > 5 times, then it will relocate to another cluster node in failover
> domain.
> >
> > To check this, I turned down NIC hosting service's floating IP and got
> > following logs -
> >
> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
> > detected
> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
> > "192.168.51.51" returned 1 (generic error)
> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
> > service:mgmt
> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service
> service:mgmt is
> > recovering*
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed
> > service service:mgmt
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip
> > "192.168.51.51" returned 1 (generic error)
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
> > service:mgmt; return value: 1
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
> > service:mgmt
> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service
> service:mgmt is
> > recovering
> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating
> failed
> > service service:mgmt*
> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service
> service:mgmt is
> > stopped
> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service
> service:mgmt is
> > stopped
> >
> > But from the log it appears that cluster tried to restart service only
> > ONCE before relocating.
> >
> > I was expecting cluster to retry starting this service five times
> on the
> > same node before relocating
> >
> > Can anybody correct my understanding?
> >
> > Thanks,
> > Parvez
>
> What version? Please paste your full cluster.conf.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
>
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster