Re: force fencing

Juan Ramon Martin Blanco <robejrm@xxxxxxxxx> · Mon, 6 Jul 2009 10:22:23 +0200

On Mon, Jul 6, 2009 at 10:08 AM, Armanet Stephane <armanets@xxxxxx> wrote:

Hello list

I'm trying to setup a 3 nodes Cluster with 2 failover Domain for an HA

mail solution.

I want 1 run active for the Imap server in the Imap Failover domain , 1

node active for the Smtp in the Smtp Failover domain and the 3rd in the

2 failover domain as a backup node.

I run Centos 5.3

My fence device is a wti power switch

My cluster.conf is in attachement

My SMTP service is composed of:

        1 IP

        1 amavisd scritp

        1 postfix script

        2 NFS mount for postfix and amavis

If I manually kill the postfix master process (to simulate a crash), my

node is not fence and the logs said:

Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing

/etc/init.d/postfix status

Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:

status of /etc/init.d/postfix failed (returned 3)

Jul  6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> status on script

"postfix" returned 1 (generic error)

Jul  6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> Stopping service

service:Postfix

Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing

/etc/init.d/amavisd stop

Jul  6 10:00:40 centos-smtp1 kernel: do_vfs_lock: VFS is out of sync

with lock manager!

Jul  6 10:00:40 centos-smtp1 last message repeated 8 times

Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Executing

/etc/init.d/postfix stop

Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:

stop of /etc/init.d/postfix failed (returned 1)

Jul  6 10:00:41 centos-smtp1 clurgmgrd[4228]: <notice> stop on script

"postfix" returned 1 (generic error)

Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Removing IPv4

address 195.83.126.201/24 from bond0

Jul  6 10:00:41 centos-smtp1 avahi-daemon[3552]: Withdrawing address

record for 195.83.126.201 on bond0.

Jul  6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting

/var/lib/amavis

Jul  6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting

/var/spool/postfix

Jul  6 10:00:51 centos-smtp1 clurgmgrd[4228]: <crit> #12: RG

service:Postfix failed to stop; intervention required

Jul  6 10:00:51 centos-smtp1 clurgmgrd[4228]: <notice> Service

service:Postfix is failed

Jul  6 10:00:52 centos-smtp1 ntpd[3322]: synchronized to 195.83.126.119,

stratum 1

Clustat said:

Cluster Status for cluster-test @ Mon Jul  6 10:02:39 2009

Member Status: Quorate

 Member Name                                                     ID   Status

 ------ ----                                                     ---- ------

 centos-imap1.ill.fr                                                 1

Online, Local, rgmanager

 centos-imap2.ill.fr                                                 2

Online, rgmanager

 centos-smtp1.ill.fr                                                 3

Online, rgmanager

 /dev/disk/by-id/scsi-360a98000567247514634507447594661-part1        0

Online, Quorum Disk

 Service Name                                                   Owner

(Last)                                                   State

 ------- ----                                                   -----

------                                                   -----

 service:Imap

centos-imap2.ill.fr                                            started

 service:Postfix

(centos-smtp1.ill.fr)                                          failed

So I have to disable the Postfix servcie with:

        clusvcadm -d Postfix

and re-enable

        clusvcadm -e Postfix

Could you explain my why my original smtp node is not fenced and why my

service is not start on the 2nd node ???

Nodes are fenced only when they lost communications with the other nodes, not when a service fails.
You should check the init scripts  to make sure it works fine outside the cluster, return values are important. I think in your case is failing because you killed postfix in a way it deleted the .pid file, and that made the init script fail.

BTW you should configure the service as recovery="relocate" if you want them to be started on a different node.

Greetings,
Juanra

Is there a way to force the fencing ???

--

ARMANET Stephane

Division Projet Technique

Service Informatique

  Groupe Infrastructure

Institut Laue langevin

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster