Re: Node is randomly fenced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hrm, I'm not really sure that I am able to interpret this without making guesses. I'm cc'ing one of the devs (who I hope will poke the right person if he's not able to help at the moment). Lets see what he has to say.

I am curious now, too. :)

On 12/06/14 03:02 PM, Schaefer, Micah wrote:
Node4 was fenced again, I was able to get some debug logs (below), a new
message :

"Jun 12 14:01:56 corosync [TOTEM ] The token was lost in the OPERATIONAL
state.“


Rest of corosync logs

http://pastebin.com/iYFbkbhb


Jun 12 14:44:49 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:49 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:49 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33494 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] got commit token
Jun 12 14:44:50 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:50 corosync [TOTEM ] Storing new sequence id for ring 6324
Jun 12 14:44:50 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:50 corosync [TOTEM ] got commit token
Jun 12 14:44:50 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:50 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:50 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:50 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:50 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:50 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:50 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:50 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:50 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101
Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:50 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:50 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:50 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:50 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:50 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:50 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms,
flushing membership messages.
Jun 12 14:44:51 corosync [TOTEM ] got commit token
Jun 12 14:44:51 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:51 corosync [TOTEM ] Storing new sequence id for ring 6328
Jun 12 14:44:51 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:51 corosync [TOTEM ] got commit token
Jun 12 14:44:51 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:51 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:51 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:51 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:51 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:51 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:51 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:51 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:51 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101
Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:51 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:51 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:51 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:51 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:51 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:51 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35455 ms,
flushing membership messages.
Jun 12 14:44:52 corosync [TOTEM ] got commit token
Jun 12 14:44:52 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:52 corosync [TOTEM ] Storing new sequence id for ring 632c
Jun 12 14:44:52 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:52 corosync [TOTEM ] got commit token
Jun 12 14:44:52 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:52 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:52 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:52 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:52 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:52 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:52 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:52 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:52 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101
Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:52 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:52 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:52 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:52 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:52 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:52 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36223 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36224 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms,
flushing membership messages.
Jun 12 14:44:53 corosync [TOTEM ] got commit token
Jun 12 14:44:53 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:53 corosync [TOTEM ] Storing new sequence id for ring 6330
Jun 12 14:44:53 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:53 corosync [TOTEM ] got commit token
Jun 12 14:44:53 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:53 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:53 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:53 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:53 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:53 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:53 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:53 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:53 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101
Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:53 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:53 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:53 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:53 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:53 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:53 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] got commit token
Jun 12 14:44:54 corosync [TOTEM ] Saving state aru 86 high seq received 86
Jun 12 14:44:54 corosync [TOTEM ] Storing new sequence id for ring 6334
Jun 12 14:44:54 corosync [TOTEM ] entering COMMIT state.
Jun 12 14:44:54 corosync [TOTEM ] got commit token
Jun 12 14:44:54 corosync [TOTEM ] entering RECOVERY state.
Jun 12 14:44:54 corosync [TOTEM ] TRANS [0] member 10.70.100.101:
Jun 12 14:44:54 corosync [TOTEM ] TRANS [1] member 10.70.100.102:
Jun 12 14:44:54 corosync [TOTEM ] TRANS [2] member 10.70.100.103:
Jun 12 14:44:54 corosync [TOTEM ] TRANS [3] member 10.70.100.104:
Jun 12 14:44:54 corosync [TOTEM ] position [0] member 10.70.100.101:
Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:54 corosync [TOTEM ] position [1] member 10.70.100.102:
Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:54 corosync [TOTEM ] position [2] member 10.70.100.103:
Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:54 corosync [TOTEM ] position [3] member 10.70.100.104:
Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101
Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1
Jun 12 14:44:54 corosync [TOTEM ] Did not need to originate any messages
in recovery.
Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru ffffffff
Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0
Jun 12 14:44:54 corosync [TOTEM ] retrans flag count 4 token aru 0 install
seq 0 aru 0 0
Jun 12 14:44:54 corosync [TOTEM ] Resetting old ring state
Jun 12 14:44:54 corosync [TOTEM ] recovery to regular 1-0
Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 1
Jun 12 14:44:54 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:54 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms,
flushing membership messages.
Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38109 ms,
flushing membership messages.









On 6/12/14, 1:55 PM, "Schaefer, Micah" <Micah.Schaefer@xxxxxxxxxx> wrote:

I just found that the clock on node1 was off by about a minute and a half
compared to the rest of the nodes.

I am running ntp, so not sure why the time wasn’t synced up. Wonder if
node1 being behind, would think it was not receiving updates from the
other nodes?







On 6/12/14, 1:29 PM, "Digimer" <lists@xxxxxxxxxx> wrote:

Even if the token changes stop the immediate fencing, don't leave it
please. There is something fundamentally wrong that you need to
identify/fix.

Keep us posted!

On 12/06/14 01:24 PM, Schaefer, Micah wrote:
The servers do not run any tasks other than the tasks in the cluster
service group.

Nodes 3 and 4 are physical servers with a lot of horsepower and nodes 1
and 2 are virtual machines with much less resources available.

I adjusted the token settings and will watch for any change.








On 6/12/14, 1:08 PM, "Digimer" <lists@xxxxxxxxxx> wrote:

On 12/06/14 12:48 PM, Schaefer, Micah wrote:
As far as the switch goes, both are Cisco Catalyst 6509-E, no
spanning
tree changes are happening and all the ports have port-fast enabled
for
these servers. My switch logging level is very high and I have no
messages
in relation to the time frames or ports.

TOTEM reports that ³A processor joined or left the membershipŠ², but
that
isn¹t enough detail.

Also note that I did not have these issues until adding new servers:
node3
and node4 to the cluster. Node1 and node2 do not fence each other
(unless
a real issue is there), and they are on different switches.

Then I can't imagine it being network anymore. Seeing as both node 3
and
4 get fenced, it's likely not hardware either. Are the workloads on 3
and 4 much higher (or are the computers much slower) than 1 and 2? I'm
wondering if the nodes are simply not keeping up with corosync
traffic.
You might try adjusting the corosync token timeout and retransmit
counts
to see if that reduces the node loses.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster





[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux