I
have a 3 node cluster running cman-2.0.84-2.el5. At times we have
spanning tree events that cause network storms up to 9 seconds. When
these events occur (today we caused them twice to verify this issue). All
three nodes go down within seconds of this event. The
second time we tried it I added the totem token statement shown below. Same
problem. <cman>
<multicast addr="225.0.0.11"/>
<totem token="21000"/>
</cman> Aug
5 16:41:18 csarcsys2-eth0 ntpd[3484]: kernel time sync enabled 0001 Aug
5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] The token was lost in the
OPERATIONAL state. Aug
5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Receive multicast socket recv
buffer size (288000 bytes). Aug
5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Transmit multicast socket send
buffer size (262142 bytes). Aug
5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 2. Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 0. Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Creating commit token because
I am the rep. Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Saving state aru 46 high seq
received 46 Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Storing new sequence id for
ring b50 Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering COMMIT state. Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering RECOVERY state. Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] position [0] member 172.xx.xx.xxx: Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] previous ring seq 2892 rep
172.xx.xxx.xx Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] aru 46 high delivered 46
received flag 1 Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Did not need to originate any
messages in recovery. Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Sending initial ORF token Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] CLM CONFIGURATION CHANGE Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] New Configuration: Aug
5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 1 Aug
5 16:41:24 csarcsys2-eth0 clurgmgrd[3750]: <emerg> #1: Quorum Dissolved Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] r(0) ip(172. xx.xxx.xx) Aug
5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 3 Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] Members Left: Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] r(0) ip(172. xx.xxx.xx) Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] r(0) ip(172. xx.xxx.xx) Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] Members Joined: Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CMAN ] quorum lost, blocking activity Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] CLM CONFIGURATION CHANGE Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] New Configuration: Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] r(0) ip(172. xx.xxx.xx) Aug
5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate. Refusing
connection. Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] Members Left: Aug
5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect:
Connection refused Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] Members Joined: Aug
5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111). Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [SYNC ] This node is within the
primary component and will provide service. Aug
5 16:41:24 csarcsys2-eth0 ccsd[3031]: Someone may be attempting something evil. Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering OPERATIONAL state. Aug
5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing get: Invalid request
descriptor Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] got nodejoin message
172.24.86.143 Aug
5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate. Refusing
connection. Aug
5 16:41:24 csarcsys2-eth0 openais[3096]: [CPG ] got joinlist message from
node 2 Aug
5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect:
Connection refused Aug
5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111). |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster