* Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > Yeah, the fact that we do presumably have PREEMPT_COUNT enabled in most > > distros does speak for just admitting that the PREEMPT_NONE / VOLUNTARY > > approach isn't actually used, and is only causing pain. > > The macro-behavior of NONE/VOLUNTARY is still used & relied upon in > server distros - and that's the behavior that enterprise distros truly > cared about. > > Micro-overhead of NONE/VOLUNTARY vs. FULL is nonzero but is in the > 'noise' category for all major distros I'd say. > > And that's what Thomas's proposal achieves: keep the nicely > execution-batched NONE/VOLUNTARY scheduling behavior for SCHED_OTHER > tasks, while having the latency advantages of fully-preemptible kernel > code for RT and critical tasks. > > So I'm fully on board with this. It would reduce the number of preemption > variants to just two: regular kernel and PREEMPT_RT. Yummie! As an additional side note: with various changes such as EEVDF the scheduler is a lot less preemption-happy these days, without wrecking latencies & timeslice distribution. So in principle we might not even need the NEED_RESCHED_LAZY extra bit, which -rt uses as a kind of additional layer to make sure they don't change scheduling policy. Ie. a modern scheduler might have mooted much of this change: 4542057e18ca ("mm: avoid 'might_sleep()' in get_mmap_lock_carefully()") ... because now we'll only reschedule on timeslice exhaustion, or if a task comes in with a big deadline deficit. And even the deadline-deficit wakeup preemption can be turned off further with: $ echo NO_WAKEUP_PREEMPTION > /debug/sched/features And we are considering making that the default behavior for same-prio tasks - basically turn same-prio SCHED_OTHER tasks into SCHED_BATCH - which should be quite similar to what NEED_RESCHED_LAZY achieves on -rt. Thanks, Ingo