RE: [RFC PATCH 00/86] Make the kernel preemptible

David Laight <David.Laight@xxxxxxxxxx> · Wed, 8 Nov 2023 16:29:40 +0000

From: Steven Rostedt
> Sent: 08 November 2023 15:16
> 
> On Wed, 8 Nov 2023 09:43:10 +0000
> David Laight <David.Laight@xxxxxxxxxx> wrote:
> 
> > > Policies:
> > >
> > > A - preemption=none: run to completion
> > > B - preemption=voluntary: run to completion, unless a task of higher
> > >     sched-class awaits
> > > C - preemption=full: optimized for low-latency. Preempt whenever a higher
> > >     priority task awaits.
> >
> > If you remove cond_resched() then won't both B and C require an extra IPI.
> > That is probably OK for RT tasks but could get expensive for
> > normal tasks that aren't bound to a specific cpu.
> 
> What IPI is extra?

I was thinking that you wouldn't currently need an IPI if the target cpu
was running in-kernel because nothing would happen until cond_resched()
was called.

> > I suspect C could also lead to tasks being pre-empted just before
> > they sleep (eg after waking another task).
> > There might already be mitigation for that, I'm not sure if
> > a voluntary sleep can be done in a non-pre-emptible section.
> 
> No, voluntary sleep can not be done in a preemptible section.

I'm guessing you missed out a negation in that (or s/not/only/).

I was thinking about sequences like:
	wake_up();
	...
	set_current_state(TASK_UNINTERRUPTIBLE)
	add_wait_queue();
	spin_unlock();
	schedule();

Where you really don't want to be pre-empted by the woken up task.
For non CONFIG_RT the lock might do it - if held long enough.
Otherwise you'll need to have pre-emption disabled and enable
it just after the set_current_state().
And then quite likely disable again after the schedule()
to balance things out.

So having the scheduler save the pre-empt disable count might
be useful.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)