I've found problem with RHEL5 cluster. When I use prioritized fail
over domain and next reset the node witch have priority set to 1 cluster
relocate service to node with priority 2. Next, when node 1 come back
,cluster is trying to relocate service back to primary node. In logfile
I always find:
Nov 19 12:32:26 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:32:26 l2 openais[1977]: [CLM ] got nodejoin message
192.168.10.10
Nov 19 12:32:26 l2 openais[1977]: [CLM ] got nodejoin message
192.168.10.11
Nov 19 12:32:26 l2 openais[1977]: [CPG ] got joinlist message from node 1
Nov 19 12:32:26 l2 clurgmgrd[2687]: <notice> Stopping service
service:vsftpd
Nov 19 12:32:41 l2 clurgmgrd[2687]: <err> #52: Failed changing RG status
Nov 19 12:32:56 l2 clurgmgrd[2687]: <err> #57: Failed changing RG status
Nov 19 12:32:57 l2 clurgmgrd: [2687]: <info> Executing
/etc/init.d/vsftpd status
I tested this many times and in this case clurgmgrd do not try to run
script with stop parameter, but when I try to relocate service manualy
using clusvcadm or when both nodes have priority 1 everything is
successful. Is also successful if I'm restarting node using reboot. I
think automatic (after crash) relocating starts too early. In my opinion
cluster do not wait for rgmanager start.
now i tried packages:
rgmanager-2.0.23-1 or rgmanager-2.0.28-1.el5
cman-2.0.73-1.el5_1.1, cman-2.0.64 and earlier from RHEL5 CD
openais-0.80.3-7.el5 and earlier from RHEL5 CD
My cluster.conf file:
<?xml version="1.0"?>
<cluster alias="OBN_HA" config_version="26" name="OBN_HA">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="l2.local" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="l2_fence" nodename="l2.local"/>
</method>
</fence>
</clusternode>
<clusternode name="l1.local" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="l1_fence" nodename="l1.local"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="l1_fence"/>
<fencedevice agent="fence_manual" name="l2_fence"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="OBN" ordered="1" restricted="0">
<failoverdomainnode name="l1.local" priority="1"/>
<failoverdomainnode name="l2.local" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources/>
<service autostart="1" domain="OBN" name="vsftpd"
recovery="relocate">
<script file="/etc/init.d/vsftpd" name="vsftpd"/>
</service>
</rm>
</cluster>
Full /dev/log/messages log :
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] Sending initial ORF token
Nov 19 12:27:39 l2 openais[1977]: [CLM ] CLM CONFIGURATION CHANGE
Nov 19 12:27:39 l2 openais[1977]: [CLM ] New Configuration:
Nov 19 12:27:39 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:27:39 l2 openais[1977]: [CLM ] Members Joined:
Nov 19 12:27:39 l2 openais[1977]: [CLM ] CLM CONFIGURATION CHANGE
Nov 19 12:27:39 l2 openais[1977]: [CLM ] New Configuration:
Nov 19 12:27:39 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11)
Nov 19 12:27:39 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:27:39 l2 openais[1977]: [CLM ] Members Joined:
Nov 19 12:27:39 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11)
Nov 19 12:27:39 l2 openais[1977]: [SYNC ] This node is within the
primary component and will provide service.
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:27:39 l2 openais[1977]: [CMAN ] quorum regained, resuming
activity
Nov 19 12:27:39 l2 openais[1977]: [CLM ] got nodejoin message
192.168.10.11
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] entering GATHER state from 11.
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] Saving state aru 9 high seq
received 9
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] Storing new sequence id for
ring e4
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] entering COMMIT state.
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] entering RECOVERY state.
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] position [0] member
192.168.10.10:
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] previous ring seq 224 rep
192.168.10.10
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] aru 1a high delivered 1a
received flag 1
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] position [1] member
192.168.10.11:
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] previous ring seq 224 rep
192.168.10.11
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] aru 9 high delivered 9
received flag 1
Nov 19 12:27:39 l2 openais[1977]: [TOTEM] Did not need to originate any
messages in recovery.
Nov 19 12:27:40 l2 openais[1977]: [CLM ] CLM CONFIGURATION CHANGE
Nov 19 12:27:40 l2 openais[1977]: [CLM ] New Configuration:
Nov 19 12:27:40 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11)
Nov 19 12:27:40 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:27:40 l2 openais[1977]: [CLM ] Members Joined:
Nov 19 12:27:40 l2 openais[1977]: [CLM ] CLM CONFIGURATION CHANGE
Nov 19 12:27:40 l2 openais[1977]: [CLM ] New Configuration:
Nov 19 12:27:40 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10)
Nov 19 12:27:40 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11)
Nov 19 12:27:40 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:27:40 l2 openais[1977]: [CLM ] Members Joined:
Nov 19 12:27:40 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10)
Nov 19 12:27:40 l2 openais[1977]: [SYNC ] This node is within the
primary component and will provide service.
Nov 19 12:27:40 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:27:40 l2 openais[1977]: [CLM ] got nodejoin message
192.168.10.10
Nov 19 12:27:40 l2 openais[1977]: [CLM ] got nodejoin message
192.168.10.11
Nov 19 12:27:40 l2 openais[1977]: [CPG ] got joinlist message from node 2
Nov 19 12:27:40 l2 ccsd[1941]: Initial status:: Quorate
[...]
Nov 19 12:28:37 l2 kernel: dlm: Using TCP for communications
Nov 19 12:28:37 l2 kernel: dlm: connecting to 2
Nov 19 12:28:38 l2 clurgmgrd[2687]: <notice> Resource Group Manager
Starting
Nov 19 12:28:38 l2 kernel: dlm: got connection from 2
[...]
Nov 19 12:28:47 l2 clurgmgrd: [2687]: <info> Executing
/etc/init.d/vsftpd stop
Nov 19 12:28:47 l2 vsftpd: script param: stop
Nov 19 12:30:27 l2 openais[1977]: [TOTEM] The token was lost in the
OPERATIONAL state.
Nov 19 12:30:27 l2 openais[1977]: [TOTEM] Receive multicast socket recv
buffer size (288000 bytes).
Nov 19 12:30:27 l2 openais[1977]: [TOTEM] Transmit multicast socket send
buffer size (219136 bytes).
Nov 19 12:30:27 l2 openais[1977]: [TOTEM] entering GATHER state from 2.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] entering GATHER state from 0.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Creating commit token because
I am the rep.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Saving state aru 28 high seq
received 28
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Storing new sequence id for
ring e8
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] entering COMMIT state.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] entering RECOVERY state.
Nov 19 12:30:32 l2 fenced[1993]: l1.local not a cluster member after 0
sec post_fail_delay
Nov 19 12:30:32 l2 kernel: dlm: closing connection to node 2
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] position [0] member
192.168.10.11:
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] previous ring seq 228 rep
192.168.10.10
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] aru 28 high delivered 28
received flag 1
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Did not need to originate any
messages in recovery.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] Sending initial ORF token
Nov 19 12:30:32 l2 fenced[1993]: fencing node "l1.local"
Nov 19 12:30:32 l2 openais[1977]: [CLM ] CLM CONFIGURATION CHANGE
Nov 19 12:30:32 l2 fence_manual: Node l1.local needs to be reset before
recovery can procede. Waiting for l1.local to rejoin the cluster or for
manual acknowledgement that it has been reset (i.e. fence_ack_manual -n
l1.local)
Nov 19 12:30:32 l2 openais[1977]: [CLM ] New Configuration:
Nov 19 12:30:32 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11)
Nov 19 12:30:32 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:30:32 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10)
Nov 19 12:30:32 l2 openais[1977]: [CLM ] Members Joined:
Nov 19 12:30:32 l2 openais[1977]: [CLM ] CLM CONFIGURATION CHANGE
Nov 19 12:30:32 l2 openais[1977]: [CLM ] New Configuration:
Nov 19 12:30:32 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11)
Nov 19 12:30:32 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:30:32 l2 openais[1977]: [CLM ] Members Joined:
Nov 19 12:30:32 l2 openais[1977]: [SYNC ] This node is within the
primary component and will provide service.
Nov 19 12:30:32 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:30:32 l2 openais[1977]: [CLM ] got nodejoin message
192.168.10.11
Nov 19 12:30:32 l2 openais[1977]: [CPG ] got joinlist message from node 1
Nov 19 12:30:52 l2 fenced[1993]: fence "l1.local" success
Nov 19 12:30:58 l2 clurgmgrd[2687]: <notice> Taking over service
service:vsftpd from down member l1.local
Nov 19 12:30:58 l2 clurgmgrd: [2687]: <info> Executing
/etc/init.d/vsftpd start
Nov 19 12:30:58 l2 vsftpd: script param: start
Nov 19 12:30:59 l2 clurgmgrd[2687]: <notice> Service service:vsftpd started
Nov 19 12:31:07 l2 clurgmgrd: [2687]: <info> Executing
/etc/init.d/vsftpd status
Nov 19 12:31:07 l2 vsftpd: script param: status
Nov 19 12:31:37 l2 clurgmgrd: [2687]: <info> Executing
/etc/init.d/vsftpd status
Nov 19 12:31:37 l2 vsftpd: script param: status
Nov 19 12:32:07 l2 clurgmgrd: [2687]: <info> Executing
/etc/init.d/vsftpd status
Nov 19 12:32:07 l2 vsftpd: script param: status
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] entering GATHER state from 11.
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] Saving state aru 18 high seq
received 18
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] Storing new sequence id for
ring ec
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] entering COMMIT state.
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] entering RECOVERY state.
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] position [0] member
192.168.10.10:
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] previous ring seq 232 rep
192.168.10.10
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] aru 9 high delivered 8
received flag 1
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] position [1] member
192.168.10.11:
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] previous ring seq 232 rep
192.168.10.11
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] aru 18 high delivered 18
received flag 1
Nov 19 12:32:25 l2 openais[1977]: [TOTEM] Did not need to originate any
messages in recovery.
Nov 19 12:32:25 l2 openais[1977]: [CLM ] CLM CONFIGURATION CHANGE
Nov 19 12:32:25 l2 openais[1977]: [CLM ] New Configuration:
Nov 19 12:32:25 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11)
Nov 19 12:32:25 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:32:25 l2 openais[1977]: [CLM ] Members Joined:
Nov 19 12:32:25 l2 openais[1977]: [CLM ] CLM CONFIGURATION CHANGE
Nov 19 12:32:25 l2 openais[1977]: [CLM ] New Configuration:
Nov 19 12:32:25 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10)
Nov 19 12:32:25 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.11)
Nov 19 12:32:25 l2 openais[1977]: [CLM ] Members Left:
Nov 19 12:32:25 l2 openais[1977]: [CLM ] Members Joined:
Nov 19 12:32:25 l2 openais[1977]: [CLM ] r(0) ip(192.168.10.10)
Nov 19 12:32:26 l2 openais[1977]: [SYNC ] This node is within the
primary component and will provide service.
Nov 19 12:32:26 l2 openais[1977]: [TOTEM] entering OPERATIONAL state.
Nov 19 12:32:26 l2 openais[1977]: [CLM ] got nodejoin message
192.168.10.10
Nov 19 12:32:26 l2 openais[1977]: [CLM ] got nodejoin message
192.168.10.11
Nov 19 12:32:26 l2 openais[1977]: [CPG ] got joinlist message from node 1
Nov 19 12:32:26 l2 clurgmgrd[2687]: <notice> Stopping service
service:vsftpd
Nov 19 12:32:41 l2 clurgmgrd[2687]: <err> #52: Failed changing RG status
Nov 19 12:32:56 l2 clurgmgrd[2687]: <err> #57: Failed changing RG status
Nov 19 12:32:57 l2 clurgmgrd: [2687]: <info> Executing
/etc/init.d/vsftpd status
Nov 19 12:32:57 l2 vsftpd: script param: status
Nov 19 12:33:20 l2 kernel: dlm: connecting to 2
Nov 19 12:33:20 l2 kernel: dlm: got connection from 2
Nov 19 12:33:36 l2 clurgmgrd: [2687]: <info> Executing
/etc/init.d/vsftpd status
Nov 19 12:33:36 l2 vsftpd: script param: status
Nov 19 12:34:06 l2 clurgmgrd: [2687]: <info> Executing
/etc/init.d/vsftpd status
daro
begin:vcard
fn:Dariusz Skorupa
n:Skorupa;Dariusz
org:WASKO S.A;DWS/SII
adr;dom:;;Barlickiego 18;Gliwice;;44 -100
email;internet:d.skorupa@xxxxxxxx
title;quoted-printable:In=C5=BCynier Serwisu
tel;work:+48 32 3325-682
x-mozilla-html:FALSE
version:2.1
end:vcard
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster