Re: [RFC][PATCH 1/2] sched: Extended scheduler time slice

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Mon, 10 Feb 2025 09:07:27 -0500

On Thu, Feb 6, 2025 at 8:30 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Wed, 5 Feb 2025 22:07:12 -0500
> Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > RT tasks don't have a time slice. They are affected by events. An external
> > > interrupt coming in, or a timer going off that states something is
> > > happening. Perhaps we could use this for SCHED_RR or maybe even
> > > SCHED_DEADLINE, as those do have time slices.
> > >
> > > But if it does get used, it should only be used when the task being
> > > scheduled is the same SCHED_RR priority, or if SCHED_DEADLINE will not fail
> > > its guarantees.
> > >
> >
> > Right, it would apply still to RR/DL though...
>
> But it would have to guarantee that the RR it is delaying is of the same
> priority, and that delaying the DL is not going to cause something to miss
> its deadline.

See Peter comment: "Then pick another number; RT too has a max
scheduling latency number (on some random hardware). If you stay below
that, all is fine.".

> > 3. Overloading the purpose of LAZY: My understanding is, the purpose
> > of LAZY is to let the scheduler decide if it wants to preempt based on
> > preemption mode. It is not based on any hint, just on the preemption
> > mode. I guess you are overloading LAZY by making LAZY flag also extend
> > userspace timeslice (versus say making the time-slice extension hint
> > its own thing...).
>
> I already replied about that. Note, LAZY was created in PREEMPT_RT for this
> very purpose (but in the kernel), and ported to vanilla for a slightly
> different purpose.
>
> Here's the history:
>
>   PREEMPT_RT would convert spin_locks in the kernel to sleeping mutexes.
>
>   This made RT tasks respond much faster to events.
>
>   But non-RT (SCHED_OTHER) started suffering performance issues.
>
>   When looking at the performance issues, we found that it was due to tasks
>   holding these sleeping spin_locks and being preempted. That is, the
>   preemption of holding spin_locks was causing more contention and slowing
>   things down tremendously.
>
>   To first handle this, adaptive mutexes was introduced. These would spin
>   if the owner of the lock was still running, and would go to sleep if the
>   owner goes to sleep. This helped things quite a bit, but PREEMPT_RT was
>   still suffer a performance deficit compared to non-RT.
>
>   This was because of the timer tick on SCHED_OTHER tasks that could
>   preempt a task holding a spin lock.
>
>   NEED_RESCHED_LAZY was introduced to remedy this. It would be set for
>   SCHED_OTHER tasks and NEED_RESCHED for RT tasks. If the task was holding
>   a sleeping spin lock, the NEED_RESCHED_LAZY would not preempt the running
>   task, but NEED_RESCHED would. If the SCHED_OTHER task was not holding a
>   sleeping spin_lock it would be preempted regardless.
>
> This improved the performance of SCHED_OTHER tasks in PREEMPT_RT to be as
> good as what was in vanilla.
>
> You see, LAZY was *created* for this purpose. Of letting the scheduler know
> that the running task is in a critical section and the timer tick should
> not preempt a SCHED_OTHER task.
> I just wanted to extend this to SCHED_OTHER in user space too.

Currently it does not "let anyone know" it is running in a critical
section though. Various paths (update_curr(), wake up) just do a
'lazy' resched until the timer tick has elapsed, or the task returns
to usermode/idle at which point schedule() is called. And it does this
only for FAIR tasks. That can well happen even if the currently
running task is not in a critical section in the kernel at all. Sure,
it may benefit critical sections in the upstream kernel but where is
that explicit?  I still feel we should not overload this in-kernel
mechanism for userspace locking and complicate things.

> > Yes, I have worked on RT projects before --  you would know better
> > than anyone. :-D. But admittedly, I haven't got to work much with
> > PREEMPT_RT systems.
>
> Just using RT policy to improve performance is not an RT project. I'm
> talking about projects that if you miss a deadline things crash. Where the
> project works very hard to make sure everything works as intended.

No no no, I have done way more than applying just the RT policy. So
that means you do not know me that well;-).. I have worked on audio
driver latency, low latency audio, latency issues in vmalloc code,
preempt tracers, irq tracepoints , wake up latency tracers and various
scheduler overhead debug — many of those issues dealt with sub
millisecond latency.. I also dealt with cpu idle issues in the
hardware causing real time latency problems (see my past talks if
interested).  I was partly a hardware engineer when I started my
career and have built circuits. I have Electronics and Computer
engineering degrees.

 - Joel