Hi, On 11/27/24 04:17, K Prateek Nayak wrote: > Hello Peter, > > On 9/10/2024 1:39 PM, tip-bot2 for Peter Zijlstra wrote: >> The following commit has been merged into the sched/core branch of tip: >> >> Commit-ID: 2e05f6c71d36f8ae1410a1cf3f12848cc17916e9 >> Gitweb: https://git.kernel.org/tip/2e05f6c71d36f8ae1410a1cf3f12848cc17916e9 >> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> AuthorDate: Fri, 06 Sep 2024 12:45:25 +02:00 >> Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> CommitterDate: Tue, 10 Sep 2024 09:51:15 +02:00 >> >> sched/eevdf: More PELT vs DELAYED_DEQUEUE >> >> Vincent and Dietmar noted that while commit fc1892becd56 fixes the >> entity runnable stats, it does not adjust the cfs_rq runnable stats, >> which are based off of h_nr_running. >> >> Track h_nr_delayed such that we can discount those and adjust the >> signal. >> >> Fixes: fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE") >> Reported-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx> >> Reported-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx> >> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> >> Link: https://lkml.kernel.org/r/20240906104525.GG4928@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > I've been testing this fix for a while now to see if it helps the > regressions reported around EEVDF complete. The issue with negative > "h_nr_delayed" reported by Luis previously seem to have been fixed as a > result of commit 75b6499024a6 ("sched/fair: Properly deactivate > sched_delayed task upon class change") I recall having 75b6499024a6 in my testing tree and somehow still noticing unbalanced accounting for h_nr_delayed, where it would be decremented twice eventually, leading to negative numbers. I might have to give it another go if we're considering including the change as-is, just to make sure. Since our setups are slightly different, we could be exercising some slightly different paths. Did this patch help with the regressions you noticed though? > > I've been running stress-ng for a while and haven't seen any cases of > negative "h_nr_delayed". I'd also added the following WARN_ON() to see > if there are any delayed tasks on the cfs_rq before switching to idle in > some of my previous experiments and I did not see any splat during my > benchmark runs. > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index 621696269584..c19a31fa46c9 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -457,6 +457,9 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct t > > static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first) > { > + /* All delayed tasks must be picked off before switching to idle */ > + SCHED_WARN_ON(rq->cfs.h_nr_delayed); > + > update_idle_core(rq); > scx_update_idle(rq, true); > schedstat_inc(rq->sched_goidle); > -- > > If you are including this back, feel free to add: > > Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx> > >> [..snip..] >