Re: [PATCH v4 5/8] sched/fair: Take into account latency priority at wakeup

Vincent Guittot <vincent.guittot@xxxxxxxxxx> · Tue, 20 Sep 2022 17:17:49 +0200

On Tue, 20 Sept 2022 at 13:32, Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> On 18 Sep 2022 12:46:00 +0200 Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
> > On Sun, 18 Sept 2022 at 00:58, Hillf Danton <hdanton@xxxxxxxx> wrote:
> > >
> > > On 16 Sep 2022 15:36:53 +0200 Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
> > > >
> > > > Hi Hillf,
> > > >
> > > > On Fri, 16 Sept 2022 at 14:03, Hillf Danton <hdanton@xxxxxxxx> wrote:
> > > > >
> > > > > Hello Vincent
> > > > >
> > > > > On 16 Sep 2022 10:03:02 +0200 Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > @@ -4606,6 +4608,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
> > > > > >
> > > > > >       se = __pick_first_entity(cfs_rq);
> > > > > >       delta = curr->vruntime - se->vruntime;
> > > > > > +     delta -= wakeup_latency_gran(curr, se);
> > > > > >
> > > > > >       if (delta < 0)
> > > > > >               return;
> > > > >
> > > > > What is derived from the latency nice you added is the runtime granulaity
> > > > > which has a role in preempting the current task.
> > > > >
> > > > > Given the same defination of latency nice as the nice, the runtime granularity
> > > > > can be computed without introducing the latency nice.
> > > > >
> > > > > Only for thoughts now.
> > > > >
> > > > > Hillf
> > > > >
> > > > > +++ b/kernel/sched/fair.c
> > > > > @@ -4569,7 +4569,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
> > > > >  static void
> > > > >  check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
> > > > >  {
> > > > > -       unsigned long ideal_runtime, delta_exec;
> > > > > +       unsigned long ideal_runtime, delta_exec, granu;
> > > > >         struct sched_entity *se;
> > > > >         s64 delta;
> > > > >
> > > > > @@ -4594,6 +4594,14 @@ check_preempt_tick(struct cfs_rq *cfs_rq
> > > > >                 return;
> > > > >
> > > > >         se = __pick_first_entity(cfs_rq);
> > > > > +
> > > > > +       granu = sysctl_sched_min_granularity +
> > > > > +               (ideal_runtime - sysctl_sched_min_granularity) *
> > > > > +               (se->latency_nice + 20) / LATENCY_NICE_WIDTH;
> > > >
> > > > There is no latency_nice field in se but a latency_offset instead
> > > >
> > > > Also at this step, we are sure that curr has run at least
> > > > sysctl_sched_min_granularity and we want now to compare curr vruntime
> > > > with first se's one. We take the latency offset into account to make
> > > > sure we will not preempt curr too early
> > > >
> > > > > +
> > > > > +       if (delta_exec < granu)
> > > > > +               return;
> > > > > +
> > > > >         delta = curr->vruntime - se->vruntime;
> > > > >
> > > > >         if (delta < 0)
> > >                 return;
> > >
> > >             if (delta > ideal_runtime)
> > >                 resched_curr(rq_of(cfs_rq));
> > >
> > > After another look, curr is not preempted without the gap in vruntime
> > > between curr and the first entity growing more than ideal runtime, while
> >
> > Curr can be preempted as it has run more than the ideal time (1st
> > test). This one is to make sure that the diff does not become too
> > large. Here we reuse the same comparison as wakeup to make sure that a
> > newly curr will get a chance to run its ideal time after  having
> > preempted at wakeup the previous current
>
> IIUC it would take two checks to preempt correctly - diff in vruntime
> is checked first to avoid preempting too early, then it is checked again
> with latency_offset taken into account to avoid preempting too late.

The 1st test in check_preempt_tick() : if (delta_exec > ideal_runtime)
ensures that a resched happens after the current run is slice

The 2nd test : if (delta_exec < sysctl_sched_min_granularity)
ensures that current will run at least 3ms

The 3rd one :  if (delta > ideal_runtime)
is there to make sure that there is not too much diff between the
vruntime. But we are comparing virtual runtime with execution time and
as Peter mentioned in another thread that's weird. [1] will fix it in
addition to ensure max runtime.
Then, current might have preempted first_entity few ms before thanks
to its latency_offset. If the tick happens quickly after the
preemption, delta might be above ideal_runtime whereas current has run
its ideal time yet

[1] https://lore.kernel.org/all/20220916131538.24706-1-vincent.guittot@xxxxxxxxxx/

>
> +++ b/kernel/sched/fair.c
> @@ -4571,7 +4571,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq
>  {
>         unsigned long ideal_runtime, delta_exec;
>         struct sched_entity *se;
> -       s64 delta;
> +       s64 delta, d2;
>
>         ideal_runtime = sched_slice(cfs_rq, curr);
>         delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
> @@ -4595,11 +4595,9 @@ check_preempt_tick(struct cfs_rq *cfs_rq
>
>         se = __pick_first_entity(cfs_rq);
>         delta = curr->vruntime - se->vruntime;
> +       d2 = delta - wakeup_latency_gran(curr, se);
>
> -       if (delta < 0)
> -               return;
> -
> -       if (delta > ideal_runtime)
> +       if (delta > ideal_runtime || d2 > ideal_runtime)
>                 resched_curr(rq_of(cfs_rq));
>  }
>