Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED

Ingo Molnar <mingo@xxxxxxxxxx> · Tue, 19 Sep 2023 10:43:08 +0200

* Ingo Molnar <mingo@xxxxxxxxxx> wrote:

> > Yeah, the fact that we do presumably have PREEMPT_COUNT enabled in most 
> > distros does speak for just admitting that the PREEMPT_NONE / VOLUNTARY 
> > approach isn't actually used, and is only causing pain.
> 
> The macro-behavior of NONE/VOLUNTARY is still used & relied upon in 
> server distros - and that's the behavior that enterprise distros truly 
> cared about.
> 
> Micro-overhead of NONE/VOLUNTARY vs. FULL is nonzero but is in the 
> 'noise' category for all major distros I'd say.
> 
> And that's what Thomas's proposal achieves: keep the nicely 
> execution-batched NONE/VOLUNTARY scheduling behavior for SCHED_OTHER 
> tasks, while having the latency advantages of fully-preemptible kernel 
> code for RT and critical tasks.
> 
> So I'm fully on board with this. It would reduce the number of preemption 
> variants to just two: regular kernel and PREEMPT_RT. Yummie!

As an additional side note: with various changes such as EEVDF the 
scheduler is a lot less preemption-happy these days, without wrecking 
latencies & timeslice distribution.

So in principle we might not even need the NEED_RESCHED_LAZY extra bit, 
which -rt uses as a kind of additional layer to make sure they don't change 
scheduling policy.

Ie. a modern scheduler might have mooted much of this change:

   4542057e18ca ("mm: avoid 'might_sleep()' in get_mmap_lock_carefully()")

... because now we'll only reschedule on timeslice exhaustion, or if a task 
comes in with a big deadline deficit.

And even the deadline-deficit wakeup preemption can be turned off further 
with:

    $ echo NO_WAKEUP_PREEMPTION > /debug/sched/features

And we are considering making that the default behavior for same-prio tasks 
- basically turn same-prio SCHED_OTHER tasks into SCHED_BATCH - which 
should be quite similar to what NEED_RESCHED_LAZY achieves on -rt.

Thanks,

	Ingo