Re: [PATCH 1/2] sched: push rt tasks only if newly activated tasks have been added

"Gregory Haskins" <ghaskins@xxxxxxxxxx> · Tue, 22 Apr 2008 10:38:32 -0600

Hi Dmitry,

(Disclaimer: I am sick with a fever today, so hopefully I'm groking your email properly and not about to say something stupid ;)

>>> On Tue, Apr 22, 2008 at 11:30 AM, in message
<b647ffbd0804220830h6524e788n1467b027bc5bc4d2@xxxxxxxxxxxxxx>, "Dmitry
Adamushko" <dmitry.adamushko@xxxxxxxxx> wrote: 
> Hi Gregory,
> 
> 
> consider the following 2-cpu system: cpu0 and cpu1.
> 
> cpu0: is idle --> in such a state, it never pulls RT tasks on its own.
> 
> T0 and T1 are RT tasks
> 
> 
> square#0:
> 
> cpu1:  T0 is running
> 
> T1 is of the same prio as T0 (shouldn't really matter but to get the
> same result it would require altering the flow of events slightly)
> 
> T1's affinity allows it to be run only on cpu1.
> T0 can run on both.
> 
> try_to_wake_up() is called for T1.
> |
> --> select_task_rq_rt() => gives cpu1
> |
> --> task_wake_up_rt()
>    |
>    ---> push_rt_tasks() -> rq->rt.pushed = 1
> 
> now, neither T1 (due to its affinity), nor T0 (it's running) can be
> pushed away to cpu0.
> 
> [ btw., (1) I'd expect that this task_wake_up_rt() thing should be
> redundant, logically-wise... I'll check once more and comment later
> on.

They are both necessary, but the key is that the select_task_rq() is a best-effort route attempt, whereas the task_wake_up() routine is the authoritative router.  By doing the push after activation, it allowed us to utilize a very clever and significant optimization on the pull side that Steven came up with.  The details of the optimization escape me now, but I do remember it was substantial to the design.  Then later we put the select_task_rq() logic in (see git-id 318e0893) to further optimize the routing by finding a likely good home before the activation takes place (saving an activation/deactivation cycle), but it still needs the post-router to protect against race conditions since its just best-effort.

> (2) any example when (p->prio >= rq->rt.highest_prio) is not true in
> task_wake_up_rt() ?

Hmm...good catch.   Looks like it should be "p->prio >= rq->curr->prio" since we only need be concerned with pushing here if the task is not going to preempt current.  Do you agree Steven, or am I missing something? 

> ]
> 
> as a result, rq->rt.pushed == 1.
> 
> Now, post_schedule_rt() won't call push_rt_tasks().
> 
> T0 and T1 are both running for some time on cpu1 (possibly
> context-switching if they are both of SCHED_RR type).
> 
> Then they both block, _first_ T1 and then T0.
> 
> After some interval of time, they wake up (let's say they are
> periodic) in the following order: _first_ T0 and then T1.
> 
> rq->rt.pushed becomes 0 and here we are back to square#0. The whole
> story repeats again.
> 
> cpu0 is idle so it won't pull T0. Both T0 and T1 are competing for the
> same cpu. Not good.
> 
> am I missing smth?

No, I think you are indeed correct.  However, I would consider the root cause of the problem to have existed prior to the "pushed" flag, so perhaps we need to address this at a different level.  The case you present would have always been problematic for FIFO, and would have "worked" for RR eventually prior to the "pushed" patch.  But I dont know if I like relying on how it worked before to fix up the system.  At the very best, T1 would have experienced a latency equal to the remainder of T0's timeslice.

Rather, I think we need to address the preemptive behavior for the case where a migratory task is on the cpu and a non-migratory task tries to wake up.  If they are equal in numerical priority, perhaps we need to treat "non-migratory" as the tie breaker.  In this case, T1 would preempt T0 from cpu1, and then we would push T0 to cpu0.  I don't quite have all the details about how this would work thought through yet.  Perhaps I should wait until my fever lifts. ;)  Thoughts?

-Greg

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html