There are broadly three sets of uses of cond_resched(): 1. Calls to cond_resched() out of the goodness of our heart, otherwise known as avoiding lockup splats. 2. Open coded variants of cond_resched_lock() which call cond_resched(). 3. Retry or error handling loops, where cond_resched() is used as a quick alternative to spinning in a tight-loop. When running under a full preemption model, the cond_resched() reduces to a NOP (not even a barrier) so removing it obviously cannot matter. But considering only voluntary preemption models (for say code that has been mostly tested under those), for set-1 and set-2 the scheduler can now preempt kernel tasks running beyond their time quanta anywhere they are preemptible() [1]. Which removes any need for these explicitly placed scheduling points. The cond_resched() calls in set-3 are a little more difficult. To start with, given it's NOP character under full preemption, it never actually saved us from a tight loop. With voluntary preemption, it's not a NOP, but it might as well be -- for most workloads the scheduler does not have an interminable supply of runnable tasks on the runqueue. So, cond_resched() is useful to not get softlockup splats, but not terribly good for error handling. Ideally, these should be replaced with some kind of timed or event wait. For now we use cond_resched_stall(), which tries to schedule if possible, and executes a cpu_relax() if not. Most cases here are from set-3. Replace with cond_resched_stall(). There were a few cases (__fixup_pi_state_owner() and futex_requeue()) where we had given up a spinlock or mutex and so, a resched, if any was needed, would have happened already. Replace with cpu_relax() in one case, with nothing in the other. [1] https://lore.kernel.org/lkml/20231107215742.363031-1-ankur.a.arora@xxxxxxxxxx/ Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Darren Hart <dvhart@xxxxxxxxxxxxx> Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx> Cc: "André Almeida" <andrealmeid@xxxxxxxxxx> Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx> --- kernel/futex/core.c | 6 +----- kernel/futex/pi.c | 6 +++--- kernel/futex/requeue.c | 1 - kernel/futex/waitwake.c | 2 +- 4 files changed, 5 insertions(+), 10 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index f10587d1d481..4821931fb19d 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -724,7 +724,7 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, goto retry; case -EAGAIN: - cond_resched(); + cond_resched_stall(); goto retry; default: @@ -822,8 +822,6 @@ static void exit_robust_list(struct task_struct *curr) */ if (!--limit) break; - - cond_resched(); } if (pending) { @@ -922,8 +920,6 @@ static void compat_exit_robust_list(struct task_struct *curr) */ if (!--limit) break; - - cond_resched(); } if (pending) { void __user *uaddr = futex_uaddr(pending, futex_offset); diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index ce2889f12375..e3f6ca4cd875 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -809,7 +809,7 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, break; case -EAGAIN: - cond_resched(); + cpu_relax(); err = 0; break; @@ -981,7 +981,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl * this task might loop forever, aka. live lock. */ wait_for_owner_exiting(ret, exiting); - cond_resched(); + cond_resched_stall(); goto retry; default: goto out_unlock_put_key; @@ -1219,7 +1219,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) return ret; pi_retry: - cond_resched(); + cond_resched_stall(); goto retry; pi_faulted: diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index cba8b1a6a4cc..9f916162ef6e 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -560,7 +560,6 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, * this task might loop forever, aka. live lock. */ wait_for_owner_exiting(ret, exiting); - cond_resched(); goto retry; default: goto out_unlock; diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index ba01b9408203..801b1ec3625a 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -277,7 +277,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, return ret; } - cond_resched(); + cond_resched_stall(); if (!(flags & FLAGS_SHARED)) goto retry_private; goto retry; -- 2.31.1