Re: [tip: sched/core] sched/eevdf: More PELT vs DELAYED_DEQUEUE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Peter,

On 9/10/2024 1:39 PM, tip-bot2 for Peter Zijlstra wrote:
The following commit has been merged into the sched/core branch of tip:

Commit-ID:     2e05f6c71d36f8ae1410a1cf3f12848cc17916e9
Gitweb:        https://git.kernel.org/tip/2e05f6c71d36f8ae1410a1cf3f12848cc17916e9
Author:        Peter Zijlstra <peterz@xxxxxxxxxxxxx>
AuthorDate:    Fri, 06 Sep 2024 12:45:25 +02:00
Committer:     Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CommitterDate: Tue, 10 Sep 2024 09:51:15 +02:00

sched/eevdf: More PELT vs DELAYED_DEQUEUE

Vincent and Dietmar noted that while commit fc1892becd56 fixes the
entity runnable stats, it does not adjust the cfs_rq runnable stats,
which are based off of h_nr_running.

Track h_nr_delayed such that we can discount those and adjust the
signal.

Fixes: fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE")
Reported-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
Reported-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: https://lkml.kernel.org/r/20240906104525.GG4928@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

I've been testing this fix for a while now to see if it helps the
regressions reported around EEVDF complete. The issue with negative
"h_nr_delayed" reported by Luis previously seem to have been fixed as a
result of commit 75b6499024a6 ("sched/fair: Properly deactivate
sched_delayed task upon class change")

I've been running stress-ng for a while and haven't seen any cases of
negative "h_nr_delayed". I'd also added the following WARN_ON() to see
if there are any delayed tasks on the cfs_rq before switching to idle in
some of my previous experiments and I did not see any splat during my
benchmark runs.

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 621696269584..c19a31fa46c9 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -457,6 +457,9 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct t
static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
 {
+	/* All delayed tasks must be picked off before switching to idle */
+	SCHED_WARN_ON(rq->cfs.h_nr_delayed);
+
 	update_idle_core(rq);
 	scx_update_idle(rq, true);
 	schedstat_inc(rq->sched_goidle);
--

If you are including this back, feel free to add:

Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

[..snip..]

--
Thanks and Regards,
Prateek





[Index of Archives]     [Linux Stable Commits]     [Linux Stable Kernel]     [Linux Kernel]     [Linux USB Devel]     [Linux Video &Media]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux