Re: Strange problems with SCHED_RR

Dietmar Maurer <dietmar@xxxxxxxxxxx> · Wed, 16 Jan 2013 09:13:19 +0000

I forgot to mention that this happens with corosync 1.4.4 

But for me this looks like a RT scheduler bug?

> We always run into strange problems when we enable the RT scheduler
> (SCHED_RR).
> 
> After some random time we get:
> 
> Jan 10 19:29:22 host1 corosync[1700]:   [TOTEM ] Retransmit List: a38e a38f
> a39
> 2 a393 a3a1 a3a7 a3a8 a3a9 a3aa a3ab a381 a382 a383 a384 a385 a386 a387
> a388 a3
> 89 a38a a38b a38c a38d a390 a391 a394 a395
> Jan 10 19:29:32 host1 corosync[1700]:   [TOTEM ] A processor failed, forming
> new configuration.
> 
> Any ideas?
> 
> The same node runs without problems when we use a kernel with
> CONFIG_RT_GROUP_SCHED disabled, or when we start corosync with '-p'.
> 
> This happens with a RHEL6.3 (openvz) based kernel, but also with newer 3.X
> kernels.
> 
> But running without raised priority seem also dangerous, so I tried using (in
> exec/main.c):
> 
>        setpriority(PRIO_PGRP, 0, -20)
> 
> And that seems to work. I wonder if this has some serious drawbacks?
> 
> 
> 
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss