Re: Strange problems with SCHED_RR

Jan Friesse <jfriesse@xxxxxxxxxx> · Fri, 18 Jan 2013 12:18:34 +0100



Dietmar,

Dietmar Maurer napsal(a):
> I forgot to mention that this happens with corosync 1.4.4 
> 
> But for me this looks like a RT scheduler bug?

Seems to be

> 
>> We always run into strange problems when we enable the RT scheduler
>> (SCHED_RR).
>>
>> After some random time we get:
>>
>> Jan 10 19:29:22 host1 corosync[1700]:   [TOTEM ] Retransmit List: a38e a38f
>> a39
>> 2 a393 a3a1 a3a7 a3a8 a3a9 a3aa a3ab a381 a382 a383 a384 a385 a386 a387
>> a388 a3
>> 89 a38a a38b a38c a38d a390 a391 a394 a395
>> Jan 10 19:29:32 host1 corosync[1700]:   [TOTEM ] A processor failed, forming
>> new configuration.
>>
>> Any ideas?
>>
>> The same node runs without problems when we use a kernel with
>> CONFIG_RT_GROUP_SCHED disabled, or when we start corosync with '-p'.
>>
>> This happens with a RHEL6.3 (openvz) based kernel, but also with newer 3.X
>> kernels.
>>
>> But running without raised priority seem also dangerous, so I tried using (in
>> exec/main.c):
>>
>>        setpriority(PRIO_PGRP, 0, -20)
>>
>> And that seems to work. I wonder if this has some serious drawbacks?

I don't think so. Actually it may be nice to have corosync option like
-r (enable or disable RT priority) + -p (enable or disable raised priority).

>>
>>
>>

Honza
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss