I forgot to mention that this happens with corosync 1.4.4 But for me this looks like a RT scheduler bug? > We always run into strange problems when we enable the RT scheduler > (SCHED_RR). > > After some random time we get: > > Jan 10 19:29:22 host1 corosync[1700]: [TOTEM ] Retransmit List: a38e a38f > a39 > 2 a393 a3a1 a3a7 a3a8 a3a9 a3aa a3ab a381 a382 a383 a384 a385 a386 a387 > a388 a3 > 89 a38a a38b a38c a38d a390 a391 a394 a395 > Jan 10 19:29:32 host1 corosync[1700]: [TOTEM ] A processor failed, forming > new configuration. > > Any ideas? > > The same node runs without problems when we use a kernel with > CONFIG_RT_GROUP_SCHED disabled, or when we start corosync with '-p'. > > This happens with a RHEL6.3 (openvz) based kernel, but also with newer 3.X > kernels. > > But running without raised priority seem also dangerous, so I tried using (in > exec/main.c): > > setpriority(PRIO_PGRP, 0, -20) > > And that seems to work. I wonder if this has some serious drawbacks? > > > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss