Re: [RFC][PATCH] fix SCHED_FIFO spec violation (backport)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 15, 2008 at 10:03 AM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
> On Sat, 2008-07-05 at 08:18 -0700, Darren Hart wrote:
>> On Fri, 2008-07-04 at 15:08 +0200, John Kacur wrote:
>
>> > On Fri, Jul 4, 2008 at 12:41 AM, Darren Hart <dvhltc@xxxxxxxxxx> wrote:
>> > > Enqueue deprioritized RT tasks to head of prio array
>> > >
>> > > This patch backports Peter Z's enqueue to head of prio array on
>> > > de-prioritization to 2.6.24.7-rt14 which doesn't have the
>> > > enqueue_rt_entity and associated changes.
>> > >
>> > > I've run several long running real-time java benchmarks and it's
>> > > holding so far.  Steven, please consider this patch for inclusion
>> > > in the next 2.6.24.7-rtX release.
>> > >
>> > > Peter, I didn't include your Signed-off-by as only about half your
>> > > original patch applied to 2.6.24.7-r14.  If you're happy with this
>> > > version, would you also sign off?
>> > >
>> > > Signed-off-by: Darren Hart <dvhltc@xxxxxxxxxx>
>> > >
>> > >
>> > > ---
>> > > Index: linux-2.6.24.7-ibmrt2.6-view/include/linux/sched.h
>> > > ===================================================================
>> > > --- linux-2.6.24.7-ibmrt2.6-view.orig/include/linux/sched.h
>> > > +++ linux-2.6.24.7-ibmrt2.6-view/include/linux/sched.h
>> > > @@ -897,11 +897,16 @@ struct uts_namespace;
>> > >  struct rq;
>> > >  struct sched_domain;
>> > >
>> > > +#define ENQUEUE_WAKEUP 0x01
>> > > +#define ENQUEUE_HEAD   0x02
>> > > +
>> > > +#define DEQUEUE_SLEEP  0x01
>> > > +
>> >
>> > Question: is ENQUEUE_WAKEUP equal to DEQUEUE_SLEEP by design or
>> > coincidence?
>>
>> Coincidence.  The ENQUEUE_* flags are only to be used with the
>> enqueue_task* methods, while the DEQUEUE_* flags are for deqeue_task*.
>> Note that the conversion of sleep to the DEQUEUE_SLEEP flag isn't really
>> necessary as there is only the one flag, but it makes the calls
>> parallel, which I suspect was Peter's intention (but I speculate here).
>
> Indeed.
>
>> > The renaming of wakeup and sleep to flags makes it at
>> > least superficially seem like they overlap. Since a large part of the
>> > patch is renaming, it might be easier to understand if the renaming
>> > was done as a separate patch, but on the other hand, that is probably
>> > just a PITA. :)
>>
>> Seems a small enough patch to be all in one to me.  If others object
>> I'll split it out, but again, I tried to keep the backport as close to
>> Peter's original patch as possible.
>
> I just hacked together what was needed to post out an RFC, never made it
> a 'pretty' series ;-)
>
> One could go the extra length and make it multiple patches, but its not
> too big, so I'm not sure we should worry about that.
>
>> >
>> > >  struct sched_class {
>> > >        const struct sched_class *next;
>> > >
>> > > -       void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup);
>> > > -       void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep);
>> > > +       void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags);
>> > > +       void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags);
>> > >        void (*yield_task) (struct rq *rq);
>> > >        int  (*select_task_rq)(struct task_struct *p, int sync);
>> > >
>
>> > > Index: linux-2.6.24.7-ibmrt2.6-view/kernel/sched_fair.c
>> > > ===================================================================
>> > > --- linux-2.6.24.7-ibmrt2.6-view.orig/kernel/sched_fair.c
>> > > +++ linux-2.6.24.7-ibmrt2.6-view/kernel/sched_fair.c
>> > > @@ -756,10 +756,11 @@ static inline struct sched_entity *paren
>> > >  * increased. Here we update the fair scheduling stats and
>> > >  * then put the task into the rbtree:
>> > >  */
>> > > -static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup)
>> > > +static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>> > >  {
>> > >        struct cfs_rq *cfs_rq;
>> > >        struct sched_entity *se = &p->se;
>> > > +       int wakeup = flags & ENQUEUE_WAKEUP;
>> >
>> > Minor nit: was it necessary to create a new int, why not just flags &=
>> > ENQUEUE_WAKEUP, plus subsequent renaming where necessary.
>
> As Darren already said, the rename of wakeup/sleep to flags is needed
> because of the multiple value thing and consistency.
>
>> Well... I've copied the entire function here for context:
>>
>> static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int
>> flags)
>> {
>>       struct cfs_rq *cfs_rq;
>>       struct sched_entity *se = &p->se;
>>       int wakeup = flags & ENQUEUE_WAKEUP;
>>
>>       for_each_sched_entity(se) {
>>               if (se->on_rq)
>>                       break;
>>               cfs_rq = cfs_rq_of(se);
>>               enqueue_entity(cfs_rq, se, wakeup);
>>               wakeup = 1;
>>       }
>> }
>>
>> Note that "wakeup = 1;" for all but the initial entity which uses the
>> flag that was passed in.  So if this is correct behavior, then the new
>> integer seems like a reasonable approach to me.  Note that the
>> dequeue_task_fair has a parallel implementation.
>>
>> Peter, can you explain why only the first entity is subject to the value
>> of the passed-in flags in these two functions.  I understand this was
>> the orginal behavior as well.
>
> group scheduling :-)
>
> So, if you haven't ran away by now I'll continue explaining.
>
> The current group scheduler is a hierarchy of schedulers, what
> enqueue_task_fair() does is enqueue the task 'somewhere' in that
> hierarchy. And then walk up the hierarchy checking to see if each parent
> is already enqueued (the if (se->on_rq) bit). If so, it can stop since
> all further parents will then too be already enqueued. However, if the
> parent is not already enqueued if must enqueue him too.
>
> So what we do is dequeue sleep a scheduler from its parent scheduler
> when there are no more tasks to run, and enqueue wakeup when a new
> 'task' arrives. Note that a 'task' can be a so called super-task which
> is essentially another scheduler.
>
>

Hey Peter, thanks for the excellent explanations. I think Darren is
waiting for a Signed-off-by: you, since it is a backport of your work.
I was just reviewing it for fun. Darren if you want to add a
Reviewed-by: John Kacur <jkacur at gmail dot com>
to the patch, pls feel free too, but it is not necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux