Dietmar, Dietmar Maurer napsal(a): > I forgot to mention that this happens with corosync 1.4.4 > > But for me this looks like a RT scheduler bug? Seems to be > >> We always run into strange problems when we enable the RT scheduler >> (SCHED_RR). >> >> After some random time we get: >> >> Jan 10 19:29:22 host1 corosync[1700]: [TOTEM ] Retransmit List: a38e a38f >> a39 >> 2 a393 a3a1 a3a7 a3a8 a3a9 a3aa a3ab a381 a382 a383 a384 a385 a386 a387 >> a388 a3 >> 89 a38a a38b a38c a38d a390 a391 a394 a395 >> Jan 10 19:29:32 host1 corosync[1700]: [TOTEM ] A processor failed, forming >> new configuration. >> >> Any ideas? >> >> The same node runs without problems when we use a kernel with >> CONFIG_RT_GROUP_SCHED disabled, or when we start corosync with '-p'. >> >> This happens with a RHEL6.3 (openvz) based kernel, but also with newer 3.X >> kernels. >> >> But running without raised priority seem also dangerous, so I tried using (in >> exec/main.c): >> >> setpriority(PRIO_PGRP, 0, -20) >> >> And that seems to work. I wonder if this has some serious drawbacks? I don't think so. Actually it may be nice to have corosync option like -r (enable or disable RT priority) + -p (enable or disable raised priority). >> >> >> Honza _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss