I'm going to pretend to have never seen the prior two patches. They do absolutely horrible things for unspecified reasons. You've utterly failed to explain what exactly is taking that 1ms+. newidle_balance() already has 'stop, you're spending too much time' controls; you've failed to explain how those are falling short and why they cannot be improved. On Wed, Apr 28, 2021 at 06:28:21PM -0500, Scott Wood wrote: > The CFS load balancer can take a little while, to the point of it having > a special LBF_NEED_BREAK flag, when the task moving code takes a > breather. > > However, at that point it will jump right back in to load balancing, > without checking whether the CPU has gained any runnable real time > (or deadline) tasks. > > Break out of load balancing in the CPU_NEWLY_IDLE case, to allow the > scheduling of the RT task. Without this, latencies of over 1ms are > seen on large systems. > > Signed-off-by: Rik van Riel <riel@xxxxxxxxxx> > Reported-by: Clark Williams <williams@xxxxxxxxxx> > Signed-off-by: Clark Williams <williams@xxxxxxxxxx> > [swood: Limit change to newidle] > Signed-off-by: Scott Wood <swood@xxxxxxxxxx> > --- > v2: Only break out of newidle balancing > > kernel/sched/fair.c | 24 ++++++++++++++++++++---- > kernel/sched/sched.h | 6 ++++++ > 2 files changed, 26 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index aa8c87b6aff8..c3500c963af2 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9502,10 +9502,21 @@ imbalanced_active_balance(struct lb_env *env) > return 0; > } > > -static int need_active_balance(struct lb_env *env) > +static bool stop_balance_early(struct lb_env *env) > +{ > + return env->idle == CPU_NEWLY_IDLE && rq_has_higher_tasks(env->dst_rq); > +} > + > +static int need_active_balance(struct lb_env *env, int *continue_balancing) > { > struct sched_domain *sd = env->sd; > > + /* Run the realtime task now; load balance later. */ > + if (stop_balance_early(env)) { > + *continue_balancing = 0; > + return 0; > + } This placement doesn't make any sense. You very much want this to return true for the sd->balance_interval = sd->min_interval block for example. And the other callsite already has an if (idle != CPU_NEWLY_IDLE) condition to use. Also, I don't think we care about the higher thing here (either); newidle is about getting *any* work here, if there's something to do, we don't need to do more. > + > if (asym_active_balance(env)) > return 1; > > @@ -9550,7 +9561,7 @@ static int should_we_balance(struct lb_env *env) > * to do the newly idle load balance. > */ > if (env->idle == CPU_NEWLY_IDLE) > - return 1; > + return !rq_has_higher_tasks(env->dst_rq); has_higher_task makes no sense here, newidle can stop the moment nr_running != 0. > > /* Try to find first idle CPU */ > for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) { > @@ -9660,6 +9671,11 @@ static int load_balance(int this_cpu, struct rq *this_rq, > > local_irq_restore(rf.flags); > > + if (stop_balance_early(&env)) { > + *continue_balancing = 0; > + goto out; > + } Same thing. > + > if (env.flags & LBF_NEED_BREAK) { > env.flags &= ~LBF_NEED_BREAK; > goto more_balance;