Hello Peter, On 9/10/2024 1:39 PM, tip-bot2 for Peter Zijlstra wrote:
The following commit has been merged into the sched/core branch of tip: Commit-ID: 2e05f6c71d36f8ae1410a1cf3f12848cc17916e9 Gitweb: https://git.kernel.org/tip/2e05f6c71d36f8ae1410a1cf3f12848cc17916e9 Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> AuthorDate: Fri, 06 Sep 2024 12:45:25 +02:00 Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx> CommitterDate: Tue, 10 Sep 2024 09:51:15 +02:00 sched/eevdf: More PELT vs DELAYED_DEQUEUE Vincent and Dietmar noted that while commit fc1892becd56 fixes the entity runnable stats, it does not adjust the cfs_rq runnable stats, which are based off of h_nr_running. Track h_nr_delayed such that we can discount those and adjust the signal. Fixes: fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE") Reported-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx> Reported-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Link: https://lkml.kernel.org/r/20240906104525.GG4928@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I've been testing this fix for a while now to see if it helps the regressions reported around EEVDF complete. The issue with negative "h_nr_delayed" reported by Luis previously seem to have been fixed as a result of commit 75b6499024a6 ("sched/fair: Properly deactivate sched_delayed task upon class change") I've been running stress-ng for a while and haven't seen any cases of negative "h_nr_delayed". I'd also added the following WARN_ON() to see if there are any delayed tasks on the cfs_rq before switching to idle in some of my previous experiments and I did not see any splat during my benchmark runs. diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 621696269584..c19a31fa46c9 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -457,6 +457,9 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct tstatic void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
{ + /* All delayed tasks must be picked off before switching to idle */ + SCHED_WARN_ON(rq->cfs.h_nr_delayed); + update_idle_core(rq); scx_update_idle(rq, true); schedstat_inc(rq->sched_goidle); -- If you are including this back, feel free to add: Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
[..snip..]
-- Thanks and Regards, Prateek