From: Steven Rostedt > Sent: 08 November 2023 15:16 > > On Wed, 8 Nov 2023 09:43:10 +0000 > David Laight <David.Laight@xxxxxxxxxx> wrote: > > > > Policies: > > > > > > A - preemption=none: run to completion > > > B - preemption=voluntary: run to completion, unless a task of higher > > > sched-class awaits > > > C - preemption=full: optimized for low-latency. Preempt whenever a higher > > > priority task awaits. > > > > If you remove cond_resched() then won't both B and C require an extra IPI. > > That is probably OK for RT tasks but could get expensive for > > normal tasks that aren't bound to a specific cpu. > > What IPI is extra? I was thinking that you wouldn't currently need an IPI if the target cpu was running in-kernel because nothing would happen until cond_resched() was called. > > I suspect C could also lead to tasks being pre-empted just before > > they sleep (eg after waking another task). > > There might already be mitigation for that, I'm not sure if > > a voluntary sleep can be done in a non-pre-emptible section. > > No, voluntary sleep can not be done in a preemptible section. I'm guessing you missed out a negation in that (or s/not/only/). I was thinking about sequences like: wake_up(); ... set_current_state(TASK_UNINTERRUPTIBLE) add_wait_queue(); spin_unlock(); schedule(); Where you really don't want to be pre-empted by the woken up task. For non CONFIG_RT the lock might do it - if held long enough. Otherwise you'll need to have pre-emption disabled and enable it just after the set_current_state(). And then quite likely disable again after the schedule() to balance things out. So having the scheduler save the pre-empt disable count might be useful. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)