On Tue, Nov 21, 2023 at 02:17:21PM +0100, Tobias Huschle wrote: > We applied both suggested patch options and ran the test again, so > > sched/eevdf: Fix vruntime adjustment on reweight > sched/fair: Update min_vruntime for reweight_entity() correctly > > and > > sched/eevdf: Delay dequeue > > Unfortunately, both variants do NOT fix the problem. > The regression remains unchanged. Thanks for testing. > I will continue getting myself familiar with how cgroups are scheduled to dig > deeper here. If there are any other ideas, I'd be happy to use them as a > starting point for further analysis. > > Would additional traces still be of interest? If so, I would be glad to > provide them. So, since it got bisected to the placement logic, but is a cgroup related issue, I was thinking that 'Delay dequeue' might not cut it, that only works for tasks, not the internal entities. The below should also work for internal entities, but last time I poked around with it I had some regressions elsewhere -- you know, fix one, wreck another type of situations on hand. But still, could you please give it a go -- it applies cleanly to linus' master and -rc2. --- Subject: sched/eevdf: Revenge of the Sith^WSleepers For tasks that have received excess service (negative lag) allow them to gain parity (zero lag) by sleeping. Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> --- kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++++++ kernel/sched/features.h | 6 ++++++ 2 files changed, 42 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d7a3c63a2171..b975e4b07a68 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5110,6 +5110,33 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq) {} #endif /* CONFIG_SMP */ +static inline u64 +entity_vlag_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) +{ + u64 now, vdelta; + s64 delta; + + if (!(flags & ENQUEUE_WAKEUP)) + return se->vlag; + + if (flags & ENQUEUE_MIGRATED) + return 0; + + now = rq_clock_task(rq_of(cfs_rq)); + delta = now - se->exec_start; + if (delta < 0) + return se->vlag; + + if (sched_feat(GENTLE_SLEEPER)) + delta /= 2; + + vdelta = __calc_delta(delta, NICE_0_LOAD, &cfs_rq->load); + if (vdelta < -se->vlag) + return se->vlag + vdelta; + + return 0; +} + static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { @@ -5133,6 +5160,15 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) lag = se->vlag; + /* + * Allow tasks that have received too much service (negative + * lag) to (re)gain parity (zero lag) by sleeping for the + * equivalent duration. This ensures they will be readily + * eligible. + */ + if (sched_feat(PLACE_SLEEPER) && lag < 0) + lag = entity_vlag_sleeper(cfs_rq, se, flags); + /* * If we want to place a task and preserve lag, we have to * consider the effect of the new entity on the weighted diff --git a/kernel/sched/features.h b/kernel/sched/features.h index a3ddf84de430..722282d3ed07 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -7,6 +7,12 @@ SCHED_FEAT(PLACE_LAG, true) SCHED_FEAT(PLACE_DEADLINE_INITIAL, true) SCHED_FEAT(RUN_TO_PARITY, true) +/* + * Let sleepers earn back lag, but not more than 0-lag. GENTLE_SLEEPERS earn at + * half the speed. + */ +SCHED_FEAT(PLACE_SLEEPER, true) +SCHED_FEAT(GENTLE_SLEEPER, true) /* * Prefer to schedule the task we woke last (assuming it failed