Hi
I have two dl585 with shared storage from MSA 1000 in a two node rhel 5.3 cluster. Priority in cluster.conf are like below.
<failoverdomainnode name="usrylxap237.merck.com" priority="1"/>
<failoverdomainnode name="usrylxap238.merck.com" priority="2"/>Whenever lower priority node usrylxap238 Is rebooted it kills cman on usrylxap237 (Higher priority node) and fence it causing reboot of it. Message I see in /var/log/messages of higher priority node is
Jun 26 11:02:36 usrylxap237 openais[4750]: [CMAN ] cman killed by node 2 because we rejoined the cluster without a full restart
Jun 26 11:03:57 usrylxap237 openais[27373]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application
After reboot when higher priority node usrylxap237 comes up it tranfers services from lower priority node to itself and everything works fine for some time. Then I see following message in /var/log/messages of higher priority node running services.
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] The token was lost in the OPERATIONAL state.
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Receive multicast socket recv buffer size (2880
00 bytes).
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Transmit multicast socket send buffer size (288
000 bytes).
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] entering GATHER state from 2.
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Creating commit token because I am the rep.
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Saving state aru 17 high seq received 17
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id for ring 420
Jun 26 09:24:26 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] The token was lost in the COMMIT state.
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] entering GATHER state from 4.
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] Creating commit token because I am the rep.
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id for ring 424
Jun 26 09:24:36 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] The token was lost in the COMMIT state.
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] entering GATHER state from 4.
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] Creating commit token because I am the rep.
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id for ring 428
Jun 26 09:24:46 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] entering RECOVERY state.
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] position [0] member 54.3.254.237:
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] previous ring seq 1052 rep 54.3.254.237
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] aru 17 high delivered 17 received flag 1
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] position [1] member 54.3.254.238:
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] previous ring seq 1052 rep 54.3.254.237
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] aru 17 high delivered 17 received flag 1
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] Did not need to originate any messages in recov
ery.
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] Sending initial ORF token
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] CLM CONFIGURATION CHANGE
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] New Configuration:
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] r(0) ip(54.3.254.237)
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] r(0) ip(54.3.254.238)
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] Members Left:
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] Members Joined:
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] CLM CONFIGURATION CHANGE
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] New Configuration:
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] r(0) ip(54.3.254.237)
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] r(0) ip(54.3.254.238)
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] Members Left:
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] Members Joined:
Jun 26 09:24:54 usrylxap237 openais[5792]: [SYNC ] This node is within the primary component and w
ill provide service.
Jun 26 09:24:54 usrylxap237 openais[5792]: [TOTEM] entering OPERATIONAL state.
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] got nodejoin message 54.3.254.237
Jun 26 09:24:54 usrylxap237 openais[5792]: [CLM ] got nodejoin message 54.3.254.238
Jun 26 09:24:54 usrylxap237 openais[5792]: [CPG ] got joinlist message from node 1
Jun 26 09:24:54 usrylxap237 openais[5792]: [CPG ] got joinlist message from node 2
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] The token was lost in the OPERATIONAL state.
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Receive multicast socket recv buffer size (2880
00 bytes).
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Transmit multicast socket send buffer size (288
000 bytes).
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] entering GATHER state from 2.
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Creating commit token because I am the rep.
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Saving state aru 17 high seq received 17
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id for ring 42c
Jun 26 09:25:23 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Creating commit token because I am the rep.
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id for ring 430
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] entering GATHER state from 13.
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Creating commit token because I am the rep.
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id for ring 434
Jun 26 09:25:33 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] Creating commit token because I am the rep.
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] Storing new sequence id for ring 438
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] entering COMMIT state.
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] entering GATHER state from 13.
Jun 26 09:25:43 usrylxap237 openais[5792]: [TOTEM] Creating commit token because I am the rep.
On the second node I can see
Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] entering GATHER state from 12.
Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] Saving state aru 17 high seq received 17
Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id for ring 420
Jun 26 09:24:26 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
Jun 26 09:24:36 usrylxap238 openais[5725]: [TOTEM] entering GATHER state from 13.
Jun 26 09:24:36 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id for ring 424
Jun 26 09:24:36 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] The token was lost in the COMMIT state.
Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] entering GATHER state from 4.
Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id for ring 428
Jun 26 09:24:46 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] entering RECOVERY state.
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] position [0] member 54.3.254.237:
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq 1052 rep 54.3.254.237
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered 17 received flag 1
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] position [1] member 54.3.254.238:
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq 1052 rep 54.3.254.237
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered 17 received flag 1
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] Did not need to originate any messages in re
covery.
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] CLM CONFIGURATION CHANGE
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] New Configuration:
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] r(0) ip(54.3.254.237)
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] r(0) ip(54.3.254.238)
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] Members Left:
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] Members Joined:
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] CLM CONFIGURATION CHANGE
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] New Configuration:
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] r(0) ip(54.3.254.237)
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] r(0) ip(54.3.254.238)
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] Members Left:
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] Members Joined:
Jun 26 09:24:54 usrylxap238 openais[5725]: [SYNC ] This node is within the primary component an
d will provide service.
Jun 26 09:24:54 usrylxap238 openais[5725]: [TOTEM] entering OPERATIONAL state.
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] got nodejoin message 54.3.254.237
Jun 26 09:24:54 usrylxap238 openais[5725]: [CLM ] got nodejoin message 54.3.254.238
Jun 26 09:24:54 usrylxap238 openais[5725]: [CPG ] got joinlist message from node 1
Jun 26 09:24:54 usrylxap238 openais[5725]: [CPG ] got joinlist message from node 2
Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] entering GATHER state from 12.
Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] Saving state aru 17 high seq received 17
Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id for ring 42c
Jun 26 09:25:23 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] The token was lost in the COMMIT state.
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering GATHER state from 4.
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id for ring 430
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering GATHER state from 13.
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id for ring 434
Jun 26 09:25:33 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] The token was lost in the COMMIT state.
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering GATHER state from 4.
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id for ring 438
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering GATHER state from 13.
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id for ring 43c
Jun 26 09:25:43 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] The token was lost in the COMMIT state.
Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] entering GATHER state from 4.
Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] Storing new sequence id for ring 440
Jun 26 09:25:53 usrylxap238 openais[5725]: [TOTEM] entering COMMIT state.
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] entering RECOVERY state.
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] position [0] member 54.3.254.237:
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq 1064 rep 54.3.254.237
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered 17 received flag 1
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] position [1] member 54.3.254.238:
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] previous ring seq 1064 rep 54.3.254.237
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] aru 17 high delivered 17 received flag 1
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] Did not need to originate any messages in re
covery.
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] CLM CONFIGURATION CHANGE
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] New Configuration:
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] r(0) ip(54.3.254.237)
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] r(0) ip(54.3.254.238)
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] Members Left:
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] Members Joined:
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] CLM CONFIGURATION CHANGE
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] New Configuration:
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] r(0) ip(54.3.254.237)
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] r(0) ip(54.3.254.238)
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] Members Left:
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] Members Joined:
Jun 26 09:25:54 usrylxap238 openais[5725]: [SYNC ] This node is within the primary component an
d will provide service.
Jun 26 09:25:54 usrylxap238 openais[5725]: [TOTEM] entering OPERATIONAL state.
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] got nodejoin message 54.3.254.237
Jun 26 09:25:54 usrylxap238 openais[5725]: [CLM ] got nodejoin message 54.3.254.238
Jun 26 09:25:54 usrylxap238 openais[5725]: [CPG ] got joinlist message from node 1
Jun 26 09:25:54 usrylxap238 openais[5725]: [CPG ] got joinlist message from node 2Now my cluster is messed up. Even though clustat and cman_tool show everything is fine. As I can not move services between the node (they are running fine on present node). It even does not give any error message when I try to move them using clusvcadm.
[root@usrylxap238 ~]# clustat
Cluster Status for cluster1 @ Sat Jun 26 11:25:12 2010
Member Status: QuorateMember Name ID Status
------ ---- ---- ------
usrylxap237.merck.com 1 Online, rgmanager
usrylxap238.merck.com 2 Online, Local, rgmanagerService Name Owner (Last) State
------- ---- ----- ------ -----
service:http-service usrylxap237.merck.com started
service:mysql usrylxap237.merck.com started
[root@usrylxap238 ~]# cman_tool status
Version: 6.1.0
Config Version: 32
Cluster Name: cluster1
Cluster Id: 26777
Cluster Member: Yes
Cluster Generation: 1276
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1
Active subsystems: 9
Flags: 2node Dirty
Ports Bound: 0 11 177
Node name: usrylxap238.merck.com
Node ID: 2
Multicast addresses: 239.192.104.2
Node addresses: 54.3.254.238I have clvmd running with locking_type = 3 and gfs2 file system mounted (using dlm) which now is hanging on higher priority node but is fine on lower priority node (Which seems is not part of cluster now).
[root@usrylxap237 ~]# service gfs2 status
Active GFS2 mountpoints:
/oracluster1[root@usrylxap238 ~]# service gfs2 status
Configured GFS2 mountpoints:
/oracluster1
Active GFS2 mountpoints:
/oracluster1Not sure why cluster is loosing membership and getting staled and GFS file system is not accessible.
Thanks
AnoopNotice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster