Re: restart or relocate?

Carlo Mandelli <camandel@xxxxxxxxxx> · Thu, 30 Nov 2006 11:50:10 +0100

Hi all,

I applied the patch but even if the script httpd restarts correctly,
returning exit code 0, rgmanager continues to try restart continuosly,
due to link failure, but never relocates service on the other node:

Nov 30 11:18:42 node1 clurgmgrd: [12994]: <info> Executing
/etc/init.d/httpd status
Nov 30 11:19:12 node1 clurgmgrd: [12994]: <info> Executing
/etc/init.d/httpd status
Nov 30 11:19:39 node1 kernel: tg3: eth1: Link is down.
Nov 30 11:19:42 node1 clurgmgrd: [12994]: <warning> Link for eth1: Not
detected
Nov 30 11:19:42 node1 clurgmgrd: [12994]: <warning> No link on eth1...
Nov 30 11:19:42 node1 clurgmgrd[12994]: <notice> status on ip
"192.168.0.3" returned 1 (generic error)
Nov 30 11:19:42 node1 clurgmgrd[12994]: <notice> Stopping service http
Nov 30 11:19:42 node1 clurgmgrd: [12994]: <info> Executing
/etc/init.d/httpd stop
Nov 30 11:19:42 node1 httpd: httpd shutdown succeeded
Nov 30 11:19:43 node1 clurgmgrd: [12994]: <info> Removing IPv4 address
192.168.0.3 from eth1
Nov 30 11:19:53 node1 clurgmgrd[12994]: <notice> Service http is recovering
Nov 30 11:19:53 node1 clurgmgrd[12994]: <notice> Recovering failed
service http
Nov 30 11:19:53 node1 clurgmgrd: [12994]: <warning> Link for eth1: Not
detected
Nov 30 11:19:53 node1 clurgmgrd: [12994]: <info> Executing
/etc/init.d/httpd start
Nov 30 11:19:53 node1 httpd: httpd startup succeeded
<...>

This is my service configuration:

 <service autostart="1" name="http">
       <ip address="192.168.0.3" monitor_link="1"/>
       <script file="/etc/init.d/httpd" name="httpd"/>
 </service>

Thanks
Carlo

Robert Peterson wrote the following on 29/11/2006 19:39:
> Carlo Mandelli wrote:
>> Hi all,
>>
>> I'm trying to test a 2 nodes cluster (RHCS U4) with apache and one
>> monitored ip on eth1 (VIP 192.168.0.3), the hearthbeat is on eth0.
>>
>> When I unplug the cable (eth1) on active node, I get these errors:
>>
>> Nov 29 17:03:54 node1 clurgmgrd: [4368]: <info> Executing
>> /etc/init.d/httpd status
>> Nov 29 17:04:24 node1 clurgmgrd: [4368]: <info> Executing
>> /etc/init.d/httpd status
>> Nov 29 17:04:25 node1 kernel: tg3: eth1: Link is down.
>> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <warning> Link for eth1: Not
>> detected
>> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <warning> No link on eth1...
>> Nov 29 17:04:44 node1 clurgmgrd[4368]: <notice> status on ip
>> "192.168.0.3" returned 1 (generic error)
>> Nov 29 17:04:44 node1 clurgmgrd[4368]: <notice> Stopping service http
>> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <info> Executing
>> /etc/init.d/httpd stop
>> Nov 29 17:04:44 node1 httpd: httpd shutdown succeeded
>> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <info> Removing IPv4 address
>> 192.168.0.3 from eth1
>> Nov 29 17:04:54 node1 clurgmgrd[4368]: <notice> Service http is
>> recovering
>> Nov 29 17:04:54 node1 clurgmgrd[4368]: <notice> Recovering failed
>> service http
>> Nov 29 17:04:54 node1 clurgmgrd: [4368]: <warning> Link for eth1: Not
>> detected
>> Nov 29 17:04:54 node1 clurgmgrd: [4368]: <info> Executing
>> /etc/init.d/httpd start
>> Nov 29 17:04:54 node1 httpd: httpd startup succeeded
>> Nov 29 17:04:54 node1 clurgmgrd[4368]: <notice> Service http started
>> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <warning> 192.168.0.3 is not
>> configured
>> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> status on ip
>> "192.168.0.3" returned 1 (generic error)
>> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> Stopping service http
>> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <info> Executing
>> /etc/init.d/httpd stop
>> Nov 29 17:05:04 node1 httpd: httpd shutdown succeeded
>> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> Service http is
>> recovering
>> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> Recovering failed
>> service http
>> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <warning> Link for eth1: Not
>> detected
>> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <info> Executing
>> /etc/init.d/httpd start
>> Nov 29 17:05:04 node1 httpd: httpd startup succeeded
>> <...>
>>
>> and it restarts the service continously.
>>
>> It performs failover only if I modify recovery mode in cluster.conf:
>>
>> <service autostart="1" name="http" recovery="relocate">
>>
>> Is there any way to set max number of retries before relocate service?
>>
>> Thanks
>> Carlo
>>   
> Hi Carlo,
> 
> You're probably the victim of the init-script-not-returning-zero issue. 
> See:
> http://sources.redhat.com/cluster/faq.html#rgm_wontrestart
> 
> Regards,
> 
> Bob Peterson
> Red Hat Cluster Suite
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster