Hi, I have problem with two node cluster. When I force a node to
faile, second node fences first one. When first one rejoin my cluster, cman
shutdown on both nodes saying : Sep 28 17:29:36 s64lmwbig3c openais[7273]: [MAIN ] Killing
node s64lmwbig3b because it has rejoined the cluster with existing state Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CMAN ] cman killed
by node 1 because we rejoined the cluster without a full restart Logs : See attached Conf : <?xml version="1.0"?> <cluster config_version="12"
name="u64lmwbig8r"> <cman
expected_votes="1" two_node="1">
<multicast addr="239.192.0.11"/> </cman>
<clusternodes>
<clusternode name="s64lmwbig3b" nodeid="1"
votes="1">
<fence>
<method name="single">
<device name="fenceHP_g3b"/>
</method>
</fence>
</clusternode>
<clusternode name="s64lmwbig3c" nodeid="2"
votes="1">
<fence>
<method name="single">
<device name="fenceHP_g3c"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="XXXXX"
lanplus="1" login="user" name="fenceHP_g3b"
passwd="password" verbose="yes"/>
<fencedevice agent="fence_ipmilan" ipaddr="XXXXX"
lanplus="1" login="user" name="fenceHP_g3c"
passwd="password" verbose="yes"/>
</fencedevices> <rm>
<failoverdomains/>
<resources/> </rm> <fence_daemon
clean_start="0" post_fail_delay="20"
post_join_delay="60"/> </cluster> Do you know what I missed ? Thanks Regards, Jean-Daniel BONNETOT |
Sep 28 17:25:23 s64lmwbig3c fenced[7294]: s64lmwbig3b not a cluster member after 20 sec post_fail_delay Sep 28 17:25:23 s64lmwbig3c fenced[7294]: fencing node "s64lmwbig3b" Sep 28 17:25:34 s64lmwbig3c fenced[7294]: fence "s64lmwbig3b" success ? Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] entering GATHER state from 11. Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] Saving state aru 13 high seq received 13 Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] Storing new sequence id for ring 1c8 Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] entering COMMIT state. Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] entering RECOVERY state. Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] position [0] member 10.151.231.215: Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 452 rep 10.151.231.215 Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] aru c high delivered c received flag 1 Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] position [1] member 10.151.231.216: Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 452 rep 10.151.231.216 Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] aru 13 high delivered 13 received flag 1 Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] Did not need to originate any messages in recovery. Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] CLM CONFIGURATION CHANGE Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] New Configuration: Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.216) Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] Members Left: Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] Members Joined: Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] CLM CONFIGURATION CHANGE Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] New Configuration: Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.215) Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.216) Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] Members Left: Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] Members Joined: Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.215) Sep 28 17:29:15 s64lmwbig3c openais[7273]: [SYNC ] This node is within the primary component and will provide service. Sep 28 17:29:15 s64lmwbig3c openais[7273]: [TOTEM] entering OPERATIONAL state. Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] got nodejoin message 10.151.231.215 Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CLM ] got nodejoin message 10.151.231.216 Sep 28 17:29:15 s64lmwbig3c openais[7273]: [CPG ] got joinlist message from node 2 Sep 28 17:29:20 s64lmwbig3c kernel: dlm: got connection from 1 ? Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering GATHER state from 11. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Creating commit token because I am the rep. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Saving state aru 2f high seq received 2f Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Storing new sequence id for ring 1cc Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering COMMIT state. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering RECOVERY state. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] position [0] member 10.151.231.216: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 456 rep 10.151.231.215 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] aru 2f high delivered 2f received flag 1 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Did not need to originate any messages in recovery. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Sending initial ORF token Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] CLM CONFIGURATION CHANGE Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] New Configuration: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.216) Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] Members Left: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.215) Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] Members Joined: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] CLM CONFIGURATION CHANGE Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] New Configuration: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.216) Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] Members Left: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] Members Joined: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [SYNC ] This node is within the primary component and will provide service. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering OPERATIONAL state. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] got nodejoin message 10.151.231.216 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CPG ] got joinlist message from node 2 Sep 28 17:29:36 s64lmwbig3c kernel: dlm: closing connection to node 1 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering GATHER state from 9. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Saving state aru e high seq received e Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Storing new sequence id for ring 1d0 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering COMMIT state. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering RECOVERY state. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] position [0] member 10.151.231.215: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 460 rep 10.151.231.215 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] aru f high delivered f received flag 1 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] position [1] member 10.151.231.216: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] previous ring seq 460 rep 10.151.231.216 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] aru e high delivered e received flag 1 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] Did not need to originate any messages in recovery. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] CLM CONFIGURATION CHANGE Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] New Configuration: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.216) Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] Members Left: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] Members Joined: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] CLM CONFIGURATION CHANGE Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] New Configuration: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.215) Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.216) Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] Members Left: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] Members Joined: Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] r(0) ip(10.151.231.215) Sep 28 17:29:36 s64lmwbig3c openais[7273]: [SYNC ] This node is within the primary component and will provide service. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [TOTEM] entering OPERATIONAL state. Sep 28 17:29:36 s64lmwbig3c openais[7273]: [MAIN ] Killing node s64lmwbig3b because it has rejoined the cluster with existing state Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CMAN ] cman killed by node 1 because we rejoined the cluster without a full restart Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] got nodejoin message 10.151.231.215 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CLM ] got nodejoin message 10.151.231.216 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CPG ] got joinlist message from node 1 Sep 28 17:29:36 s64lmwbig3c openais[7273]: [CPG ] got joinlist message from node 2 Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading all openais components Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_confdb v0 (20/10) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_cpg v0 (19/8) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_cfg v0 (18/7) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_msg v0 (17/6) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_lck v0 (16/5) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_evt v0 (15/4) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_ckpt v0 (14/3) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_amf v0 (13/2) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_clm v0 (12/1) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_evs v0 (11/0) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] Unloading openais component: openais_cman v0 (10/9) Sep 28 17:29:37 s64lmwbig3c openais[7273]: [SERV ] AIS Executive exiting (reason: CMAN kill requested, exiting). Sep 28 17:29:37 s64lmwbig3c gfs_controld[7306]: cluster is down, exiting Sep 28 17:29:37 s64lmwbig3c dlm_controld[7300]: cluster is down, exiting Sep 28 17:29:37 s64lmwbig3c kernel: dlm: closing connection to node 2 Sep 28 17:29:37 s64lmwbig3c clurgmgrd[8204]: <warning> #67: Shutting down uncleanly Sep 28 17:29:37 s64lmwbig3c clurgmgrd[8204]: <notice> Shutdown complete, exiting Sep 28 17:29:37 s64lmwbig3c syslogd: /dev/console: Invalid argument Sep 28 17:30:03 s64lmwbig3c ccsd[7263]: Unable to connect to cluster infrastructure after 30 seconds.
------- Ce message et toutes les pièces jointes sont établis à l'intention exclusive de ses destinataires et sont confidentiels. L'intégrité de ce message n'étant pas assurée sur Internet, la SNCF ne peut être tenue responsable des altérations qui pourraient se produire sur son contenu. Toute publication, utilisation, reproduction, ou diffusion, même partielle, non autorisée préalablement par la SNCF, est strictement interdite. Si vous n'êtes pas le destinataire de ce message, merci d'en avertir immédiatement l'expéditeur et de le détruire. ------- This message and any attachments are intended solely for the addressees and are confidential. SNCF may not be held responsible for their contents whose accuracy and completeness cannot be guaranteed over the Internet. Unauthorized use, disclosure, distribution, copying, or any part thereof is strictly prohibited. If you are not the intended recipient of this message, please notify the sender immediately and delete it.
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster