On 06/12/2014 09:06 PM, Digimer wrote: > Hrm, I'm not really sure that I am able to interpret this without making > guesses. I'm cc'ing one of the devs (who I hope will poke the right > person if he's not able to help at the moment). Lets see what he has to > say. > > I am curious now, too. :) Chrissie/Honza: can you please take a look at this thread and see if there is a latent bug? I find it odd that the Process pause detected is kicking in so many times without a fencing action. Fabio > > On 12/06/14 03:02 PM, Schaefer, Micah wrote: >> Node4 was fenced again, I was able to get some debug logs (below), a new >> message : >> >> "Jun 12 14:01:56 corosync [TOTEM ] The token was lost in the OPERATIONAL >> state.“ >> >> >> Rest of corosync logs >> >> http://pastebin.com/iYFbkbhb >> >> >> Jun 12 14:44:49 corosync [TOTEM ] entering OPERATIONAL state. >> Jun 12 14:44:49 corosync [TOTEM ] A processor joined or left the >> membership and a new membership was formed. >> Jun 12 14:44:49 corosync [TOTEM ] waiting_trans_ack changed to 0 >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] entering GATHER state from 12. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms, >> flushing membership messages. >> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33294 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33363 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33432 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33494 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33495 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33564 ms, >> flushing membership messages. >> Jun 12 14:44:50 corosync [TOTEM ] got commit token >> Jun 12 14:44:50 corosync [TOTEM ] Saving state aru 86 high seq >> received 86 >> Jun 12 14:44:50 corosync [TOTEM ] Storing new sequence id for ring 6324 >> Jun 12 14:44:50 corosync [TOTEM ] entering COMMIT state. >> Jun 12 14:44:50 corosync [TOTEM ] got commit token >> Jun 12 14:44:50 corosync [TOTEM ] entering RECOVERY state. >> Jun 12 14:44:50 corosync [TOTEM ] TRANS [0] member 10.70.100.101: >> Jun 12 14:44:50 corosync [TOTEM ] TRANS [1] member 10.70.100.102: >> Jun 12 14:44:50 corosync [TOTEM ] TRANS [2] member 10.70.100.103: >> Jun 12 14:44:50 corosync [TOTEM ] TRANS [3] member 10.70.100.104: >> Jun 12 14:44:50 corosync [TOTEM ] position [0] member 10.70.100.101: >> Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep >> 10.70.100.101 >> Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:50 corosync [TOTEM ] position [1] member 10.70.100.102: >> Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep >> 10.70.100.101 >> Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:50 corosync [TOTEM ] position [2] member 10.70.100.103: >> Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep >> 10.70.100.101 >> Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:50 corosync [TOTEM ] position [3] member 10.70.100.104: >> Jun 12 14:44:50 corosync [TOTEM ] previous ring seq 25376 rep >> 10.70.100.101 >> Jun 12 14:44:50 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:50 corosync [TOTEM ] Did not need to originate any messages >> in recovery. >> Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 0, aru ffffffff >> Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 1, aru 0 >> Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 2, aru 0 >> Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:50 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 3, aru 0 >> Jun 12 14:44:50 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:50 corosync [TOTEM ] retrans flag count 4 token aru 0 >> install >> seq 0 aru 0 0 >> Jun 12 14:44:50 corosync [TOTEM ] Resetting old ring state >> Jun 12 14:44:50 corosync [TOTEM ] recovery to regular 1-0 >> Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 1 >> Jun 12 14:44:50 corosync [TOTEM ] entering OPERATIONAL state. >> Jun 12 14:44:50 corosync [TOTEM ] A processor joined or left the >> membership and a new membership was formed. >> Jun 12 14:44:50 corosync [TOTEM ] waiting_trans_ack changed to 0 >> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms, >> flushing membership messages. >> Jun 12 14:44:51 corosync [TOTEM ] entering GATHER state from 12. >> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms, >> flushing membership messages. >> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms, >> flushing membership messages. >> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34338 ms, >> flushing membership messages. >> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms, >> flushing membership messages. >> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms, >> flushing membership messages. >> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms, >> flushing membership messages. >> Jun 12 14:44:51 corosync [TOTEM ] Process pause detected for 34407 ms, >> flushing membership messages. >> Jun 12 14:44:51 corosync [TOTEM ] got commit token >> Jun 12 14:44:51 corosync [TOTEM ] Saving state aru 86 high seq >> received 86 >> Jun 12 14:44:51 corosync [TOTEM ] Storing new sequence id for ring 6328 >> Jun 12 14:44:51 corosync [TOTEM ] entering COMMIT state. >> Jun 12 14:44:51 corosync [TOTEM ] got commit token >> Jun 12 14:44:51 corosync [TOTEM ] entering RECOVERY state. >> Jun 12 14:44:51 corosync [TOTEM ] TRANS [0] member 10.70.100.101: >> Jun 12 14:44:51 corosync [TOTEM ] TRANS [1] member 10.70.100.102: >> Jun 12 14:44:51 corosync [TOTEM ] TRANS [2] member 10.70.100.103: >> Jun 12 14:44:51 corosync [TOTEM ] TRANS [3] member 10.70.100.104: >> Jun 12 14:44:51 corosync [TOTEM ] position [0] member 10.70.100.101: >> Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep >> 10.70.100.101 >> Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:51 corosync [TOTEM ] position [1] member 10.70.100.102: >> Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep >> 10.70.100.101 >> Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:51 corosync [TOTEM ] position [2] member 10.70.100.103: >> Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep >> 10.70.100.101 >> Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:51 corosync [TOTEM ] position [3] member 10.70.100.104: >> Jun 12 14:44:51 corosync [TOTEM ] previous ring seq 25380 rep >> 10.70.100.101 >> Jun 12 14:44:51 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:51 corosync [TOTEM ] Did not need to originate any messages >> in recovery. >> Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 0, aru ffffffff >> Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 1, aru 0 >> Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 2, aru 0 >> Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:51 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 3, aru 0 >> Jun 12 14:44:51 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:51 corosync [TOTEM ] retrans flag count 4 token aru 0 >> install >> seq 0 aru 0 0 >> Jun 12 14:44:51 corosync [TOTEM ] Resetting old ring state >> Jun 12 14:44:51 corosync [TOTEM ] recovery to regular 1-0 >> Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 1 >> Jun 12 14:44:51 corosync [TOTEM ] entering OPERATIONAL state. >> Jun 12 14:44:51 corosync [TOTEM ] A processor joined or left the >> membership and a new membership was formed. >> Jun 12 14:44:51 corosync [TOTEM ] waiting_trans_ack changed to 0 >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] entering GATHER state from 12. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35177 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35246 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35316 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35385 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35454 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] Process pause detected for 35455 ms, >> flushing membership messages. >> Jun 12 14:44:52 corosync [TOTEM ] got commit token >> Jun 12 14:44:52 corosync [TOTEM ] Saving state aru 86 high seq >> received 86 >> Jun 12 14:44:52 corosync [TOTEM ] Storing new sequence id for ring 632c >> Jun 12 14:44:52 corosync [TOTEM ] entering COMMIT state. >> Jun 12 14:44:52 corosync [TOTEM ] got commit token >> Jun 12 14:44:52 corosync [TOTEM ] entering RECOVERY state. >> Jun 12 14:44:52 corosync [TOTEM ] TRANS [0] member 10.70.100.101: >> Jun 12 14:44:52 corosync [TOTEM ] TRANS [1] member 10.70.100.102: >> Jun 12 14:44:52 corosync [TOTEM ] TRANS [2] member 10.70.100.103: >> Jun 12 14:44:52 corosync [TOTEM ] TRANS [3] member 10.70.100.104: >> Jun 12 14:44:52 corosync [TOTEM ] position [0] member 10.70.100.101: >> Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep >> 10.70.100.101 >> Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:52 corosync [TOTEM ] position [1] member 10.70.100.102: >> Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep >> 10.70.100.101 >> Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:52 corosync [TOTEM ] position [2] member 10.70.100.103: >> Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep >> 10.70.100.101 >> Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:52 corosync [TOTEM ] position [3] member 10.70.100.104: >> Jun 12 14:44:52 corosync [TOTEM ] previous ring seq 25384 rep >> 10.70.100.101 >> Jun 12 14:44:52 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:52 corosync [TOTEM ] Did not need to originate any messages >> in recovery. >> Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 0, aru ffffffff >> Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 1, aru 0 >> Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 2, aru 0 >> Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:52 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 3, aru 0 >> Jun 12 14:44:52 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:52 corosync [TOTEM ] retrans flag count 4 token aru 0 >> install >> seq 0 aru 0 0 >> Jun 12 14:44:52 corosync [TOTEM ] Resetting old ring state >> Jun 12 14:44:52 corosync [TOTEM ] recovery to regular 1-0 >> Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 1 >> Jun 12 14:44:52 corosync [TOTEM ] entering OPERATIONAL state. >> Jun 12 14:44:52 corosync [TOTEM ] A processor joined or left the >> membership and a new membership was formed. >> Jun 12 14:44:52 corosync [TOTEM ] waiting_trans_ack changed to 0 >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36223 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] entering GATHER state from 12. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36224 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36293 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36362 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36431 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36432 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] Process pause detected for 36501 ms, >> flushing membership messages. >> Jun 12 14:44:53 corosync [TOTEM ] got commit token >> Jun 12 14:44:53 corosync [TOTEM ] Saving state aru 86 high seq >> received 86 >> Jun 12 14:44:53 corosync [TOTEM ] Storing new sequence id for ring 6330 >> Jun 12 14:44:53 corosync [TOTEM ] entering COMMIT state. >> Jun 12 14:44:53 corosync [TOTEM ] got commit token >> Jun 12 14:44:53 corosync [TOTEM ] entering RECOVERY state. >> Jun 12 14:44:53 corosync [TOTEM ] TRANS [0] member 10.70.100.101: >> Jun 12 14:44:53 corosync [TOTEM ] TRANS [1] member 10.70.100.102: >> Jun 12 14:44:53 corosync [TOTEM ] TRANS [2] member 10.70.100.103: >> Jun 12 14:44:53 corosync [TOTEM ] TRANS [3] member 10.70.100.104: >> Jun 12 14:44:53 corosync [TOTEM ] position [0] member 10.70.100.101: >> Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep >> 10.70.100.101 >> Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:53 corosync [TOTEM ] position [1] member 10.70.100.102: >> Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep >> 10.70.100.101 >> Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:53 corosync [TOTEM ] position [2] member 10.70.100.103: >> Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep >> 10.70.100.101 >> Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:53 corosync [TOTEM ] position [3] member 10.70.100.104: >> Jun 12 14:44:53 corosync [TOTEM ] previous ring seq 25388 rep >> 10.70.100.101 >> Jun 12 14:44:53 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:53 corosync [TOTEM ] Did not need to originate any messages >> in recovery. >> Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 0, aru ffffffff >> Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 1, aru 0 >> Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 2, aru 0 >> Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:53 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 3, aru 0 >> Jun 12 14:44:53 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:53 corosync [TOTEM ] retrans flag count 4 token aru 0 >> install >> seq 0 aru 0 0 >> Jun 12 14:44:53 corosync [TOTEM ] Resetting old ring state >> Jun 12 14:44:53 corosync [TOTEM ] recovery to regular 1-0 >> Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 1 >> Jun 12 14:44:53 corosync [TOTEM ] entering OPERATIONAL state. >> Jun 12 14:44:53 corosync [TOTEM ] A processor joined or left the >> membership and a new membership was formed. >> Jun 12 14:44:53 corosync [TOTEM ] waiting_trans_ack changed to 0 >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms, >> flushing membership messages. >> Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12. >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37267 ms, >> flushing membership messages. >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms, >> flushing membership messages. >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37268 ms, >> flushing membership messages. >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms, >> flushing membership messages. >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 37337 ms, >> flushing membership messages. >> Jun 12 14:44:54 corosync [TOTEM ] got commit token >> Jun 12 14:44:54 corosync [TOTEM ] Saving state aru 86 high seq >> received 86 >> Jun 12 14:44:54 corosync [TOTEM ] Storing new sequence id for ring 6334 >> Jun 12 14:44:54 corosync [TOTEM ] entering COMMIT state. >> Jun 12 14:44:54 corosync [TOTEM ] got commit token >> Jun 12 14:44:54 corosync [TOTEM ] entering RECOVERY state. >> Jun 12 14:44:54 corosync [TOTEM ] TRANS [0] member 10.70.100.101: >> Jun 12 14:44:54 corosync [TOTEM ] TRANS [1] member 10.70.100.102: >> Jun 12 14:44:54 corosync [TOTEM ] TRANS [2] member 10.70.100.103: >> Jun 12 14:44:54 corosync [TOTEM ] TRANS [3] member 10.70.100.104: >> Jun 12 14:44:54 corosync [TOTEM ] position [0] member 10.70.100.101: >> Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep >> 10.70.100.101 >> Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:54 corosync [TOTEM ] position [1] member 10.70.100.102: >> Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep >> 10.70.100.101 >> Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:54 corosync [TOTEM ] position [2] member 10.70.100.103: >> Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep >> 10.70.100.101 >> Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:54 corosync [TOTEM ] position [3] member 10.70.100.104: >> Jun 12 14:44:54 corosync [TOTEM ] previous ring seq 25392 rep >> 10.70.100.101 >> Jun 12 14:44:54 corosync [TOTEM ] aru 86 high delivered 86 received >> flag 1 >> Jun 12 14:44:54 corosync [TOTEM ] Did not need to originate any messages >> in recovery. >> Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 0, aru ffffffff >> Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 1, aru 0 >> Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 2, aru 0 >> Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:54 corosync [TOTEM ] token retrans flag is 0 my set retrans >> flag0 retrans queue empty 1 count 3, aru 0 >> Jun 12 14:44:54 corosync [TOTEM ] install seq 0 aru 0 high seq received 0 >> Jun 12 14:44:54 corosync [TOTEM ] retrans flag count 4 token aru 0 >> install >> seq 0 aru 0 0 >> Jun 12 14:44:54 corosync [TOTEM ] Resetting old ring state >> Jun 12 14:44:54 corosync [TOTEM ] recovery to regular 1-0 >> Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 1 >> Jun 12 14:44:54 corosync [TOTEM ] entering OPERATIONAL state. >> Jun 12 14:44:54 corosync [TOTEM ] A processor joined or left the >> membership and a new membership was formed. >> Jun 12 14:44:54 corosync [TOTEM ] waiting_trans_ack changed to 0 >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms, >> flushing membership messages. >> Jun 12 14:44:54 corosync [TOTEM ] entering GATHER state from 12. >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms, >> flushing membership messages. >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38108 ms, >> flushing membership messages. >> Jun 12 14:44:54 corosync [TOTEM ] Process pause detected for 38109 ms, >> flushing membership messages. >> >> >> >> >> >> >> >> >> >> On 6/12/14, 1:55 PM, "Schaefer, Micah" <Micah.Schaefer@xxxxxxxxxx> wrote: >> >>> I just found that the clock on node1 was off by about a minute and a >>> half >>> compared to the rest of the nodes. >>> >>> I am running ntp, so not sure why the time wasn’t synced up. Wonder if >>> node1 being behind, would think it was not receiving updates from the >>> other nodes? >>> >>> >>> >>> >>> >>> >>> >>> On 6/12/14, 1:29 PM, "Digimer" <lists@xxxxxxxxxx> wrote: >>> >>>> Even if the token changes stop the immediate fencing, don't leave it >>>> please. There is something fundamentally wrong that you need to >>>> identify/fix. >>>> >>>> Keep us posted! >>>> >>>> On 12/06/14 01:24 PM, Schaefer, Micah wrote: >>>>> The servers do not run any tasks other than the tasks in the cluster >>>>> service group. >>>>> >>>>> Nodes 3 and 4 are physical servers with a lot of horsepower and >>>>> nodes 1 >>>>> and 2 are virtual machines with much less resources available. >>>>> >>>>> I adjusted the token settings and will watch for any change. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 6/12/14, 1:08 PM, "Digimer" <lists@xxxxxxxxxx> wrote: >>>>> >>>>>> On 12/06/14 12:48 PM, Schaefer, Micah wrote: >>>>>>> As far as the switch goes, both are Cisco Catalyst 6509-E, no >>>>>>> spanning >>>>>>> tree changes are happening and all the ports have port-fast enabled >>>>>>> for >>>>>>> these servers. My switch logging level is very high and I have no >>>>>>> messages >>>>>>> in relation to the time frames or ports. >>>>>>> >>>>>>> TOTEM reports that ³A processor joined or left the membershipŠ², but >>>>>>> that >>>>>>> isn¹t enough detail. >>>>>>> >>>>>>> Also note that I did not have these issues until adding new servers: >>>>>>> node3 >>>>>>> and node4 to the cluster. Node1 and node2 do not fence each other >>>>>>> (unless >>>>>>> a real issue is there), and they are on different switches. >>>>>> >>>>>> Then I can't imagine it being network anymore. Seeing as both node 3 >>>>>> and >>>>>> 4 get fenced, it's likely not hardware either. Are the workloads on 3 >>>>>> and 4 much higher (or are the computers much slower) than 1 and 2? >>>>>> I'm >>>>>> wondering if the nodes are simply not keeping up with corosync >>>>>> traffic. >>>>>> You might try adjusting the corosync token timeout and retransmit >>>>>> counts >>>>>> to see if that reduces the node loses. >>>>>> >>>>>> -- >>>>>> Digimer >>>>>> Papers and Projects: https://alteeve.ca/w/ >>>>>> What if the cure for cancer is trapped in the mind of a person >>>>>> without >>>>>> access to education? >>>>>> >>>>>> -- >>>>>> Linux-cluster mailing list >>>>>> Linux-cluster@xxxxxxxxxx >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>>> >>>> >>>> >>>> -- >>>> Digimer >>>> Papers and Projects: https://alteeve.ca/w/ >>>> What if the cure for cancer is trapped in the mind of a person without >>>> access to education? >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@xxxxxxxxxx >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster