Re: [RFC][PATCH 1/2] sched: Extended scheduler time slice

Suleiman Souhlal <suleiman@xxxxxxxxxx> · Tue, 4 Feb 2025 12:28:41 +0900

On Tue, Feb 4, 2025 at 1:45 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Mon, 3 Feb 2025 09:43:06 +0100
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > Lazy is not the default, nor even the recommended preemption method at
> > this time.
>
> That's OK. If it is considered to be the default in the future, this can
> wait.
>
> >
> > Lazy will not ever be the only preemption method, full isn't going
> > anywhere.
>
> That's fine too, as full preemption has the same issue of preempting
> kernel mutexes. Full preemption is for something that likely doesn't want
> this feature anyway.
>
> >
> > Lazy only applies to fair (and whatever bpf things end up using
> > resched_curr_lazy()).
>
> Is that a problem? User spin locks for RT tasks are very dangerous. If an
> RT task preempts the owner that is of lower priority, it can cause a
> deadlock (if the two tasks are pinned to the same CPU). Which BTW,
> Sebastion mentioned in the Stable RT meeting that glibc supplies a
> pthread_spin_lock() and doesn't have in the man page anything about this
> possible scenario.
>
> >
> > Lazy works on tick granularity, which is variable per the HZ config, and
> > way too long for any of this nonsense.
>
> Patch 2 changes that to do what you wrote the last time. It has a max wait
> time of 50us.
>
> >
> > So by tying this to lazy, you get something that doesn't actually work
> > most of the time, and when it works, it has variable and bad behaviour.
>
> Um no. If we wait for lazy to become the default behavior, it will work
> most of the time. And when it does work, it has strict behavior of 50us.
>
> >
> > So yeah, crap.
>
> As your rationale was not correct, I will disagree with this being crap.
>
>
> >
> > This really isn't difficult to understand, and I've told you this
> > before.
>
> And I listened to what you told me before. Patch 2 implements the 50us max
> that you suggested. I separated it out because it made the code simpler to
> understand and debug. The change log even mentioned:
>
>      For the moment, it lets it run for one more tick (which will be
>      changed later).
>
> That "changed later" is the second patch in this series.
>
> With the "this can wait until lazy is default", is because we have an
> "upstream first" policy. As long as there is some buy-in to the changes, we
> can go ahead and implement it on our devices. We do not have to wait for it
> to be accepted. But if there's a strong NAK to the idea, it is much harder
> to get it implemented internally.

Can you explain why this approach requires PREEMPT_LAZY?

Could  exit_to_user_mode_loop() be changed to something like the
following (with maybe some provision to only do it once)?

if ((ti_work & _TIF_NEED_RESCHED) && !rseq_delay_resched())
    schedule();

I suppose there would also need to be some additional changes to make
sure full preemption also doesn't preempt, maybe in
preempt_schedule*().

-- Suleiman