Re: [RFC][PATCH 1/2] sched: Extended scheduler time slice

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Wed, 12 Feb 2025 13:11:13 +0100

On 2025-02-11 10:28:01 [-0500], Steven Rostedt wrote:
> On Tue, 11 Feb 2025 09:21:38 +0100
> Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:
> 
> > We don't follow this behaviour exactly today.
> > 
> > Adding this behaviour back vs the behaviour we have now, doesn't seem to
> > improve anything at visible levels. We don't have a counter but we can
> > look at the RCU nesting counter which should be zero once locks have
> > been dropped. So this can be used for testing.
> > 
> > But as I said: using "run to completion" and preempt on the return
> > userland rather than once the lazy flag is seen and all locks have been
> > released appears to be better.
> > 
> > It is (now) possible that you run for a long time and get preempted
> > while holding a spinlock_t. It is however more likely that you release
> > all locks and get preempted while returning to userland.
> 
> IIUC, today, LAZY causes all SCHED_OTHER tasks to act more like
> PREEMPT_NONE. Is that correct?

Well. First sched-tick will set the LAZY bit, the second sched-tick
forces a resched.
On PREEMPT_NONE the sched-tick would be set NEED_RESCHED while nothing
will force a resched until the task decides to do schedule() on its own.
So it is slightly different for kernel threads.

Unless we talk about userland, here we would have a resched on the
return to userland after the sched-tick LAZY or NONE does not matter.

> Now that the PREEMPT_RT is not one of the preemption selections, when you
> select PREEMPT_RT, you can pick between CONFIG_PREEMPT and
> CONFIG_PREEMPT_LAZY. Where CONFIG_PREEMPT will preempt the kernel at the
> scheduler tick if preemption is enabled and CONFIG_PREEMPT_LAZY will
> not preempt the kernel on a scheduler tick and wait for exit to user space.

This is not specific to RT but FULL vs LAZY. But yes. However the second
sched-tick will force preemption point even without the
exit-to-userland.

> Sebastian,
> 
> It appears you only tested the CONFIG_PREEMPT_LAZY selection. Have you
> tested the difference of how CONFIG_PREEMPT behaves between PREEMPT_RT and
> no PREEMPT_RT? I think that will show a difference like we had in the past.

Not that I remember testing PREEMPT vs PREEMPT_RT. I remember people
complained about high networking load on RT which become visible due to
threaded interrupts (as in top) while for non-RT it was more or less
hidden and not clearly visible due to selected accounting. The network
performance was mostly the same as far as I remember (that is gbit).

> I can see people picking both PREEMPT_RT and CONFIG_PREEMPT (Full), but
> then wondering why their non RT tasks are suffering from a performance
> penalty from that.

We might want to opt in for lazy by default on RT. That was the case in
the RT queue until it was replaced with PREEMPT_AUTO.
But then why not use LAZY in favour of PREEMPT. Mike had numbers
   https://lore.kernel.org/all/9df22ebbc2e6d426099bf380477a0ed885068896.camel@xxxxxx/

where LAZY had mostly the voluntary performance with less context
switches than preempt. Which means also without the need for
cond_resched() and friends.

> -- Steve

Sebastian