On Tue, Sep 19 2023 at 10:43, Ingo Molnar wrote: > * Ingo Molnar <mingo@xxxxxxxxxx> wrote: > Ie. a modern scheduler might have mooted much of this change: > > 4542057e18ca ("mm: avoid 'might_sleep()' in get_mmap_lock_carefully()") > > ... because now we'll only reschedule on timeslice exhaustion, or if a task > comes in with a big deadline deficit. > > And even the deadline-deficit wakeup preemption can be turned off further > with: > > $ echo NO_WAKEUP_PREEMPTION > /debug/sched/features > > And we are considering making that the default behavior for same-prio tasks > - basically turn same-prio SCHED_OTHER tasks into SCHED_BATCH - which > should be quite similar to what NEED_RESCHED_LAZY achieves on -rt. I don't think that you can get rid of NEED_RESCHED_LAZY for !RT because there is a clear advantage of having the return to user preemption point. It spares to have the kernel/user transition just to get the task back via the timeslice interrupt. I experimented with that on RT and the result was definitely worse. We surely can revisit that, but I'd really start with the straight forward mappable LAZY bit approach and if experimentation turns out to provide good enough results by not setting that bit at all, then we still can do so without changing anything except the core scheduler decision logic. It's again a cheap thing due to the way how the return to user TIF handling works: ti_work = read_thread_flags(); if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) ti_work = exit_to_user_mode_loop(regs, ti_work); TIF_LAZY_RESCHED is part of EXIT_TO_USER_MODE_WORK, so the non-work case does not become more expensive than today. If any of the bits is set, then the slowpath wont get measurably different performance whether the bit is evaluated or not in exit_to_user_mode_loop(). As we really want TIF_LAZY_RESCHED for RT, we just keep all of this consistent in terms of code and purely a scheduler decision whether it utilizes it or not. As a consequence PREEMPT_RT is not longer special in that regard and the main RT difference becomes the lock substitution and forced interrupt threading. For the magic 'spare me the extra conditional' optimization of exit_to_user_mode_loop() if LAZY can be optimized out for !RT because the scheduler is sooo clever (which I doubt), we can just use the same approach as for other TIF bits and define them to 0 :) So lets start consistent and optimize on top if really required. Thanks, tglx