Re: IP Relocate Error / IP Restart error

Dan Deshayes <dan.deshayes@xxxxxxxxxxxx> · Tue, 10 Jul 2007 16:50:44 +0200

Lon Hohberger wrote:
On Mon, Jul 09, 2007 at 04:06:40PM +0200, dan.deshayes@xxxxxxxxxxxx wrote:

Hi,
thx for the reply but I'm not sure thats my problem.
I couldn't find the syntax for disabling the exclusivity (I'm not using gui)
but as far as I've understood its disabled by default. I tried with
exclusive="0" (not sure if its the right syntax though) but didn't solve
my problem.
But if the cluster was running with exclusive-mode the relocation
shouldn't work either, right?
As stated earlier the service restarts fine aslong as the node already
have an external ip.
Anyone with other ideas. maybe related to the "IP monitor failing
periodically"? but I don't have any problems running the cluster aslong as
the bond0 interface goes down, so maybe not.

I haven't figured out the cause here, but disabling the 'ping' test
seems to fix it.

(edit ip.sh and change the 'ping' command to /bin/true or whatever)

I'm afraid it didn't help much.
I changed the pingcmd in the function ping_check to /bin/true restarted 
the rgmanagers but didn't work.

Here is my full configuration: http://nangilima.se/cluster.conf

I can have the full cluster running without problem, when first starting

bit when i then try to restart it with 'clusvcadm -R' it says:
Jul 10 16:22:20 asl012 clurgmgrd[412]: <notice> Stopping service 
service:www-project1
Jul 10 16:22:31 asl012 clurgmgrd[412]: <notice> Service 
service:www-project1 is stopped
Jul 10 16:22:31 asl012 clurgmgrd[412]: <notice> Starting stopped service 
service:www-project1
Jul 10 16:22:32 asl012 clurgmgrd[412]: <notice> start on ip "<external 
ip 1>" returned 1 (generic error)
Jul 10 16:22:32 asl012 clurgmgrd[412]: <warning> #68: Failed to start 
service:www-project1; return value: 1
Jul 10 16:22:32 asl012 clurgmgrd[412]: <notice> Stopping service 
service:www-project1
Jul 10 16:22:32 asl012 clurgmgrd: [412]: <err> script:psql-db: stop of 
/etc/init.d/postgresql failed (returned 1)
Jul 10 16:22:32 asl012 clurgmgrd[412]: <notice> stop on script "psql-db" 
returned 1 (generic error)
Jul 10 16:22:32 asl012 clurgmgrd[412]: <crit> #12: RG 
service:www-project1 failed to stop; intervention required
Jul 10 16:22:32 asl012 clurgmgrd[412]: <notice> Service 
service:www-project1 is failed
Jul 10 16:22:32 asl012 clurgmgrd[412]: <crit> #13: Service 
service:www-project1 failed to stop cleanly

then i disable the service and enable it on node usl001-mgmnt which 
works fine (since it got net through its own ip and route)
Jul 10 16:25:18 usl001 clurgmgrd[30130]: <notice> Starting disabled 
service service:www-project1
Jul 10 16:25:18 usl001 avahi-daemon[3533]: Registering new address 
record for <external ip 1> on bond0.
Jul 10 16:25:22 usl001 clurgmgrd[30130]: <notice> Service 
service:www-project1 started

also relocating it to node usl002-mgmnt works and then back to 
usl001-mgmnt works.
But never back to asl012-mgmnt except when i manully puts back the ip 
and route.

I'm using bond0 interface configured the following:
DEVICE=bond0
USERCTL=no
ONBOOT=yes
BROADCAST=<broadcast>
NETWORK=<network>.32
NETMASK=255.255.255.224
IPADDR=<external ip 1>
GATEWAY=<gw ip>

with slave interfaces eth0 and eth3 like this:
DEVICE=eth0 /3
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

I can supply more info if anyone wants to give it a shot.
sorry for repeting my question but i'm closing a deadline and walking 
blind ;)

Regards, Dan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster