Node4 was fenced again, I was able to get some debug logs (below), a new message : "Jun 12 14:01:56 corosync [TOTEM ] The token was lost in the OPERATIONAL state.“ Rest of corosync logs http://pastebin.com/iYFbkbhb Jun 12 14:44:49 corosync [TOTEM ] entering OPERATIONAL state. Jun 12 14:44:49 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 12 14:44:49 corosync [TOTEM ] waiting_trans_ack changed to 0 Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] entering GATHER state from 12. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms, flushing membership messages. Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33494 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms, flushing membership messages. Jun 12 14:44:50 corosync [TOTEM ] got commit token Jun 12 14:44:50 corosync [TOTEM ] Saving state aru 86 high seq received 86 Jun 12 14:44:50 corosync [TOTEM ] Storing new sequence id for ring 6324 Jun 12 14:44:50 corosync [TOTEM ] entering COMMIT state. Jun 12 14:44:50 corosync [TOTEM ] got commit token Jun 12 14:44:50 corosync [TOTEM ] entering RECOVERY state. Jun 12 14:44:50 corosync [TOTEM ] TRANS [0] member 10.70.100.101: Jun 12 14:44:50 corosync [TOTEM ] TRANS [1] member 10.70.100.102: Jun 12 14:44:50 corosync [TOTEM ] TRANS [2] member 10.70.100.103: Jun 12 14:44:50 corosync [TOTEM ] TRANS [3] member 10.70.100.104: Jun 12 14:44:50 corosync [TOTEM ] position [0] member 10.70.100.101: Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101 Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:50 corosync [TOTEM ] position [1] member 10.70.100.102: Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101 Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:50 corosync [TOTEM ] position [2] member 10.70.100.103: Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101 Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:50 corosync [TOTEM ] position [3] member 10.70.100.104: Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep 10.70.100.101 Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:50 corosync [TOTEM ] Did not need to originate any messages in recovery. Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru ffffffff Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:50 corosync [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 Jun 12 14:44:50 corosync [TOTEM ] Resetting old ring state Jun 12 14:44:50 corosync [TOTEM ] recovery to regular 1-0 Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 1 Jun 12 14:44:50 corosync [TOTEM ] entering OPERATIONAL state. Jun 12 14:44:50 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 0 Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms, flushing membership messages. Jun 12 14:44:51 corosync [TOTEM ] entering GATHER state from 12. Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms, flushing membership messages. Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms, flushing membership messages. Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms, flushing membership messages. Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms, flushing membership messages. Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms, flushing membership messages. Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms, flushing membership messages. Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms, flushing membership messages. Jun 12 14:44:51 corosync [TOTEM ] got commit token Jun 12 14:44:51 corosync [TOTEM ] Saving state aru 86 high seq received 86 Jun 12 14:44:51 corosync [TOTEM ] Storing new sequence id for ring 6328 Jun 12 14:44:51 corosync [TOTEM ] entering COMMIT state. Jun 12 14:44:51 corosync [TOTEM ] got commit token Jun 12 14:44:51 corosync [TOTEM ] entering RECOVERY state. Jun 12 14:44:51 corosync [TOTEM ] TRANS [0] member 10.70.100.101: Jun 12 14:44:51 corosync [TOTEM ] TRANS [1] member 10.70.100.102: Jun 12 14:44:51 corosync [TOTEM ] TRANS [2] member 10.70.100.103: Jun 12 14:44:51 corosync [TOTEM ] TRANS [3] member 10.70.100.104: Jun 12 14:44:51 corosync [TOTEM ] position [0] member 10.70.100.101: Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101 Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:51 corosync [TOTEM ] position [1] member 10.70.100.102: Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101 Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:51 corosync [TOTEM ] position [2] member 10.70.100.103: Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101 Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:51 corosync [TOTEM ] position [3] member 10.70.100.104: Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep 10.70.100.101 Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:51 corosync [TOTEM ] Did not need to originate any messages in recovery. Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru ffffffff Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:51 corosync [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 Jun 12 14:44:51 corosync [TOTEM ] Resetting old ring state Jun 12 14:44:51 corosync [TOTEM ] recovery to regular 1-0 Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 1 Jun 12 14:44:51 corosync [TOTEM ] entering OPERATIONAL state. Jun 12 14:44:51 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 0 Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] entering GATHER state from 12. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35455 ms, flushing membership messages. Jun 12 14:44:52 corosync [TOTEM ] got commit token Jun 12 14:44:52 corosync [TOTEM ] Saving state aru 86 high seq received 86 Jun 12 14:44:52 corosync [TOTEM ] Storing new sequence id for ring 632c Jun 12 14:44:52 corosync [TOTEM ] entering COMMIT state. Jun 12 14:44:52 corosync [TOTEM ] got commit token Jun 12 14:44:52 corosync [TOTEM ] entering RECOVERY state. Jun 12 14:44:52 corosync [TOTEM ] TRANS [0] member 10.70.100.101: Jun 12 14:44:52 corosync [TOTEM ] TRANS [1] member 10.70.100.102: Jun 12 14:44:52 corosync [TOTEM ] TRANS [2] member 10.70.100.103: Jun 12 14:44:52 corosync [TOTEM ] TRANS [3] member 10.70.100.104: Jun 12 14:44:52 corosync [TOTEM ] position [0] member 10.70.100.101: Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101 Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:52 corosync [TOTEM ] position [1] member 10.70.100.102: Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101 Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:52 corosync [TOTEM ] position [2] member 10.70.100.103: Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101 Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:52 corosync [TOTEM ] position [3] member 10.70.100.104: Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep 10.70.100.101 Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:52 corosync [TOTEM ] Did not need to originate any messages in recovery. Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru ffffffff Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:52 corosync [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 Jun 12 14:44:52 corosync [TOTEM ] Resetting old ring state Jun 12 14:44:52 corosync [TOTEM ] recovery to regular 1-0 Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 1 Jun 12 14:44:52 corosync [TOTEM ] entering OPERATIONAL state. Jun 12 14:44:52 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 0 Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36223 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] entering GATHER state from 12. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36224 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms, flushing membership messages. Jun 12 14:44:53 corosync [TOTEM ] got commit token Jun 12 14:44:53 corosync [TOTEM ] Saving state aru 86 high seq received 86 Jun 12 14:44:53 corosync [TOTEM ] Storing new sequence id for ring 6330 Jun 12 14:44:53 corosync [TOTEM ] entering COMMIT state. Jun 12 14:44:53 corosync [TOTEM ] got commit token Jun 12 14:44:53 corosync [TOTEM ] entering RECOVERY state. Jun 12 14:44:53 corosync [TOTEM ] TRANS [0] member 10.70.100.101: Jun 12 14:44:53 corosync [TOTEM ] TRANS [1] member 10.70.100.102: Jun 12 14:44:53 corosync [TOTEM ] TRANS [2] member 10.70.100.103: Jun 12 14:44:53 corosync [TOTEM ] TRANS [3] member 10.70.100.104: Jun 12 14:44:53 corosync [TOTEM ] position [0] member 10.70.100.101: Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101 Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:53 corosync [TOTEM ] position [1] member 10.70.100.102: Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101 Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:53 corosync [TOTEM ] position [2] member 10.70.100.103: Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101 Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:53 corosync [TOTEM ] position [3] member 10.70.100.104: Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep 10.70.100.101 Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:53 corosync [TOTEM ] Did not need to originate any messages in recovery. Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru ffffffff Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:53 corosync [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 Jun 12 14:44:53 corosync [TOTEM ] Resetting old ring state Jun 12 14:44:53 corosync [TOTEM ] recovery to regular 1-0 Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 1 Jun 12 14:44:53 corosync [TOTEM ] entering OPERATIONAL state. Jun 12 14:44:53 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 0 Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms, flushing membership messages. Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12. Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms, flushing membership messages. Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms, flushing membership messages. Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms, flushing membership messages. Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms, flushing membership messages. Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms, flushing membership messages. Jun 12 14:44:54 corosync [TOTEM ] got commit token Jun 12 14:44:54 corosync [TOTEM ] Saving state aru 86 high seq received 86 Jun 12 14:44:54 corosync [TOTEM ] Storing new sequence id for ring 6334 Jun 12 14:44:54 corosync [TOTEM ] entering COMMIT state. Jun 12 14:44:54 corosync [TOTEM ] got commit token Jun 12 14:44:54 corosync [TOTEM ] entering RECOVERY state. Jun 12 14:44:54 corosync [TOTEM ] TRANS [0] member 10.70.100.101: Jun 12 14:44:54 corosync [TOTEM ] TRANS [1] member 10.70.100.102: Jun 12 14:44:54 corosync [TOTEM ] TRANS [2] member 10.70.100.103: Jun 12 14:44:54 corosync [TOTEM ] TRANS [3] member 10.70.100.104: Jun 12 14:44:54 corosync [TOTEM ] position [0] member 10.70.100.101: Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101 Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:54 corosync [TOTEM ] position [1] member 10.70.100.102: Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101 Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:54 corosync [TOTEM ] position [2] member 10.70.100.103: Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101 Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:54 corosync [TOTEM ] position [3] member 10.70.100.104: Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep 10.70.100.101 Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received flag 1 Jun 12 14:44:54 corosync [TOTEM ] Did not need to originate any messages in recovery. Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru ffffffff Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 Jun 12 14:44:54 corosync [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 Jun 12 14:44:54 corosync [TOTEM ] Resetting old ring state Jun 12 14:44:54 corosync [TOTEM ] recovery to regular 1-0 Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 1 Jun 12 14:44:54 corosync [TOTEM ] entering OPERATIONAL state. Jun 12 14:44:54 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 0 Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms, flushing membership messages. Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12. Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms, flushing membership messages. Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms, flushing membership messages. Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38109 ms, flushing membership messages. On 6/12/14, 1:55 PM, "Schaefer, Micah" <Micah.Schaefer@xxxxxxxxxx> wrote: >I just found that the clock on node1 was off by about a minute and a half >compared to the rest of the nodes. > >I am running ntp, so not sure why the time wasn’t synced up. Wonder if >node1 being behind, would think it was not receiving updates from the >other nodes? > > > > > > > >On 6/12/14, 1:29 PM, "Digimer" <lists@xxxxxxxxxx> wrote: > >>Even if the token changes stop the immediate fencing, don't leave it >>please. There is something fundamentally wrong that you need to >>identify/fix. >> >>Keep us posted! >> >>On 12/06/14 01:24 PM, Schaefer, Micah wrote: >>> The servers do not run any tasks other than the tasks in the cluster >>> service group. >>> >>> Nodes 3 and 4 are physical servers with a lot of horsepower and nodes 1 >>> and 2 are virtual machines with much less resources available. >>> >>> I adjusted the token settings and will watch for any change. >>> >>> >>> >>> >>> >>> >>> >>> >>> On 6/12/14, 1:08 PM, "Digimer" <lists@xxxxxxxxxx> wrote: >>> >>>> On 12/06/14 12:48 PM, Schaefer, Micah wrote: >>>>> As far as the switch goes, both are Cisco Catalyst 6509-E, no >>>>>spanning >>>>> tree changes are happening and all the ports have port-fast enabled >>>>>for >>>>> these servers. My switch logging level is very high and I have no >>>>> messages >>>>> in relation to the time frames or ports. >>>>> >>>>> TOTEM reports that ³A processor joined or left the membershipŠ², but >>>>> that >>>>> isn¹t enough detail. >>>>> >>>>> Also note that I did not have these issues until adding new servers: >>>>> node3 >>>>> and node4 to the cluster. Node1 and node2 do not fence each other >>>>> (unless >>>>> a real issue is there), and they are on different switches. >>>> >>>> Then I can't imagine it being network anymore. Seeing as both node 3 >>>>and >>>> 4 get fenced, it's likely not hardware either. Are the workloads on 3 >>>> and 4 much higher (or are the computers much slower) than 1 and 2? I'm >>>> wondering if the nodes are simply not keeping up with corosync >>>>traffic. >>>> You might try adjusting the corosync token timeout and retransmit >>>>counts >>>> to see if that reduces the node loses. >>>> >>>> -- >>>> Digimer >>>> Papers and Projects: https://alteeve.ca/w/ >>>> What if the cure for cancer is trapped in the mind of a person without >>>> access to education? >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@xxxxxxxxxx >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >> >> >>-- >>Digimer >>Papers and Projects: https://alteeve.ca/w/ >>What if the cure for cancer is trapped in the mind of a person without >>access to education? >> >>-- >>Linux-cluster mailing list >>Linux-cluster@xxxxxxxxxx >>https://www.redhat.com/mailman/listinfo/linux-cluster > > >-- >Linux-cluster mailing list >Linux-cluster@xxxxxxxxxx >https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster