Re: Problems to start ony one cluster service (SOLVED but ...)

carlopmart <carlopmart@xxxxxxxxx> · Thu, 29 Nov 2007 15:51:52 +0100

carlopmart wrote:
carlopmart wrote:
Lon Hohberger wrote:
On Tue, 2007-11-27 at 11:26 +0100, carlopmart wrote:
Hi all

  I have a very strange problem. I have configured three nodes under 
RHCS on rhel5.1 servers. All works ok, except for one service that 
never starts when rgmanager start-up. My cluster conf is:

<?xml version="1.0"?>
<cluster alias="RhelXenCluster" config_version="17" 
name="RhelXenCluster">
         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
         <clusternodes>
                 <clusternode name="rhelclu01.hpulabs.org" 
nodeid="1" votes="1">
                         <fence>
                                 <method name="1">
                                         <device name="gnbd-fence" 
nodename="rhelclu01.hpulabs.org"/>
                                 </method>
                         </fence>
                         <multicast addr="239.192.75.55" 
interface="eth0"/>
                 </clusternode>
                 <clusternode name="rhelclu02.hpulabs.org" 
nodeid="2" votes="1">
                         <fence>
                                 <method name="1">
                                         <device name="gnbd-fence" 
nodename="rhelclu02.hpulabs.org"/>
                                 </method>
                         </fence>
                         <multicast addr="239.192.75.55" 
interface="eth0"/>
                 </clusternode>
                 <clusternode name="rhelclu03.hpulabs.org" 
nodeid="3" votes="1">
                         <fence>
                                 <method name="1">
                                         <device name="gnbd-fence" 
nodename="rhelclu03.hpulabs.org"/>
                                 </method>
                         </fence>
                         <multicast addr="239.192.75.55" 
interface="xenbr0"/>
                 </clusternode>
         </clusternodes>
         <cman expected_votes="1" two_node="0">
                 <multicast addr="239.192.75.55"/>
         </cman>
         <fencedevices>
                 <fencedevice agent="fence_gnbd" name="gnbd-fence" 
servers="rhelclu03.hpulabs.org"/>
         </fencedevices>
         <rm log_facility="local4" log_level="7">
                 <failoverdomains>
                         <failoverdomain name="PriCluster" 
ordered="1" restricted="1">
                                 <failoverdomainnode 
name="rhelclu01.hpulabs.org" priority="1"/>
                                 <failoverdomainnode 
name="rhelclu02.hpulabs.org" priority="2"/>
                         </failoverdomain>
                         <failoverdomain name="SecCluster" 
ordered="1" restricted="1">
                                 <failoverdomainnode 
name="rhelclu02.hpulabs.org" priority="1"/>
                                 <failoverdomainnode 
name="rhelclu01.hpulabs.org" priority="2"/>
                         </failoverdomain>
                 </failoverdomains>
                 <resources>
            <ip address="172.25.50.10" monitor_link="1"/>
                         <ip address="172.25.50.11" monitor_link="1"/>
                         <ip address="172.25.50.12" monitor_link="1"/>
                         <ip address="172.25.50.13" monitor_link="1"/>
                         <ip address="172.25.50.14" monitor_link="1"/>
                         <ip address="172.25.50.15" monitor_link="1"/>
                         <ip address="172.25.50.16" monitor_link="1"/>
                         <ip address="172.25.50.17" monitor_link="1"/>
                         <ip address="172.25.50.18" monitor_link="1"/>
                         <ip address="172.25.50.19" monitor_link="1"/>
                         <ip address="172.25.50.20" monitor_link="1"/>
                 </resources>
                 <service autostart="1" domain="PriCluster" 
name="dns-svc" recovery="relocate">
                         <ip ref="172.25.50.10">
                                 <script 
file="/data/cfgcluster/etc/init.d/named" name="named"/>
                         </ip>
                 </service>
                 <service autostart="1" domain="SecCluster" 
name="mail-svc" recovery="relocate">
                         <ip ref="172.25.50.11">
                                 <script 
file="/data/cfgcluster/etc/init.d/postfix-cluster" name="postfix"/>
                         </ip>
                 </service>
                 <service autostart="1" domain="SecCluster" 
name="rsync-svc" recovery="relocate">
                         <ip ref="172.25.50.13">
                                 <script 
file="/data/cfgcluster/etc/init.d/rsyncd" name="rsyncd"/>
                         </ip>
                 </service>
                 <service autostart="1" domain="PriCluster" 
name="wwwsoft-svc" recovery="relocate">
                         <ip ref="172.25.50.14">
                                 <script 
file="/data/cfgcluster/etc/init.d/httpd-mirror" name="httpd-mirror"/>
                         </ip>
                 </service>
                 <service autostart="1" domain="SecCluster" 
name="proxy-svc" recovery="relocate">
                         <ip ref="172.25.50.15">
                                 <script 
file="/data/cfgcluster/etc/init.d/squid" name="squid"/>
                         </ip>
                 </service>
         </rm>
</cluster>

  The service that returns me errors and never starts when rgmanager 
start-up is postfix-cluster. On maillog file I find this error:

  Nov 26 11:27:31 rhelclu01 postfix[27959]: fatal: parameter 
inet_interfaces: no local interface found for 172.25.50.11
Nov 26 11:27:43 rhelclu01 postfix[28313]: fatal: 
/data/cfgcluster/etc/postfix-cluster/postfix-script: Permission denied

  but thath's not true. If I start this service manually all works 
ok. Postfix configuration it is ok, What can be the problem??? I 
don't know why rgmanager dosen't config 172.25.50.11 address before 
execute postfix-cluster service ....

When you start it manually -- how?
* add IP manually / running the script?
* rg_test?
* clusvcadm -e?

-- Lon

Another strange thing: at this morning this service is stopped, when I 
try to start using clusvcadm returns this error:

Nov 28 09:28:21 rhelclu01 clurgmgrd[1450]: <warning> #68: Failed to 
start service:mail-svc; return value: 1
Nov 28 09:28:21 rhelclu01 clurgmgrd[1450]: <notice> Stopping service 
service:mail-svc
Nov 28 09:28:22 rhelclu01 clurgmgrd: [1450]: <err> script:postfix: 
stop of /data/cfgcluster/etc/init.d/postfix-cluster failed (returned 1)
Nov 28 09:28:22 rhelclu01 clurgmgrd[1450]: <notice> stop on script 
"postfix" returned 1 (generic error)
Nov 28 09:28:22 rhelclu01 in.rdiscd[11610]: setsockopt 
(IP_ADD_MEMBERSHIP): Address already in use
Nov 28 09:28:22 rhelclu01 in.rdiscd[11610]: Failed joining addresses
Nov 28 09:28:32 rhelclu01 clurgmgrd[1450]: <notice> Service 
service:mail-svc is recovering
Nov 28 09:28:32 rhelclu01 clurgmgrd[1450]: <warning> #71: Relocating 
failed service service:mail-svc
Nov 28 09:28:32 rhelclu01 clurgmgrd[1450]: <notice> Stopping service 
service:mail-svc

 I don't understand this. IP 172.25.50.11 isn't used by anyone ....

Finally I have found where is the problem: I need to put 
alternate_config param under first postfix instance and now all works 
ok. Service starts, stops and relocate ok but I found a little problem: 
clurgmgrd doesn't checks the status of the service. If I remove status 
flag from init script for the resource, nothing occurs. Do I need to put 
any param on cluster.conf to check services every 1 min or 2???

Thanks.

Please, any hints???

--
CL Martinez
carlopmart {at} gmail {d0t} com

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster