On Wed, 2020-04-29 at 11:05 +0200, Peter Zijlstra wrote: > On Tue, Apr 28, 2020 at 06:20:32PM -0500, Scott Wood wrote: > > On Wed, 2020-04-29 at 01:02 +0200, Peter Zijlstra wrote: > > > On Tue, Apr 28, 2020 at 05:55:03PM -0500, Scott Wood wrote: > > > > On Wed, 2020-04-29 at 00:09 +0200, Peter Zijlstra wrote: > > > > > Also, if you move it this late, this is entirely the wrong > > > > > place. If you do it after the context switch either use the > > > > > balance_callback or put it in the idle path. > > > > > > > > > > But what Valentin said; this needs a fair bit of support, the > > > > > whole reason we've never done this is to avoid that double > > > > > context switch... > > > > > > > > > > > > > balance_callback() enters with the rq lock held but BH not > > > > separately > > > > > > BH? softirqs you mean? Pray tell more. > > > > In https://lore.kernel.org/lkml/5122CD9C.9070702@xxxxxxxxxx/ the need to > > keep softirqs disabled during rebalance was brought up, but simply > > wrapping > > the lock dropping in local_bh_enable()/local_bh_disable() meant that > > local_bh_enable() would be called with interrupts disabled, which isn't > > allowed. > > That thread, nor your explanation make any sense -- why do we care about > softirqs?, I was trusting Steve's claim that that was the issue (it seemed plausible given that system-wide rebalancing is done from a softirq). If things have changed since then, great. If that was never the issue, then there's the question of what caused the bug Sasha saw. > nor do I see how placing it in finish_task_switch() helps > with any of this. It lets us do the local_bh_enable() after IRQs are enabled, since we don't enter with any existing atomic context. Though I suppose we could instead do another lock drop at the end of newidle_balance() just to enable softirqs. > > > > disabled, which interferes with the ability to enable interrupts > > > > but not BH. It also gets called from rt_mutex_setprio() and > > > > __sched_setscheduler(), and I didn't want the caller of those to > > > > be stuck with the latency. > > > > > > You're not reading it right. > > > > Could you elaborate? > > If you were to do a queue_balance_callback() from somewhere in the > pick_next_task() machinery, then the balance_callback() at the end of > __schedule() would run it, and it'd be gone. How would > rt_mutex_setprio() / __sched_setscheduler() be affected? The rq lock is dropped between queue_balance_callback() and the balance_callback() at the end of __schedule(). What stops setprio/setscheduler on another cpu from doing the callback at that point? -Scott