Paul E. McKenney <paulmck@xxxxxxxxxx> writes: > On Tue, Nov 07, 2023 at 01:57:34PM -0800, Ankur Arora wrote: >> cond_resched() is used to provide urgent quiescent states for >> read-side critical sections on PREEMPT_RCU=n configurations. >> This was necessary because lacking preempt_count, there was no >> way for the tick handler to know if we were executing in RCU >> read-side critical section or not. >> >> An always-on CONFIG_PREEMPT_COUNT, however, allows the tick to >> reliably report quiescent states. >> >> Accordingly, evaluate preempt_count() based quiescence in >> rcu_flavor_sched_clock_irq(). >> >> Suggested-by: Paul E. McKenney <paulmck@xxxxxxxxxx> >> Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx> >> --- >> kernel/rcu/tree_plugin.h | 3 ++- >> kernel/sched/core.c | 15 +-------------- >> 2 files changed, 3 insertions(+), 15 deletions(-) >> >> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h >> index f87191e008ff..618f055f8028 100644 >> --- a/kernel/rcu/tree_plugin.h >> +++ b/kernel/rcu/tree_plugin.h >> @@ -963,7 +963,8 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp) >> */ >> static void rcu_flavor_sched_clock_irq(int user) >> { >> - if (user || rcu_is_cpu_rrupt_from_idle()) { >> + if (user || rcu_is_cpu_rrupt_from_idle() || >> + !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { > > This looks good. > >> /* >> * Get here if this CPU took its interrupt from user >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index bf5df2b866df..15db5fb7acc7 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -8588,20 +8588,7 @@ int __sched _cond_resched(void) >> preempt_schedule_common(); >> return 1; >> } >> - /* >> - * In preemptible kernels, ->rcu_read_lock_nesting tells the tick >> - * whether the current CPU is in an RCU read-side critical section, >> - * so the tick can report quiescent states even for CPUs looping >> - * in kernel context. In contrast, in non-preemptible kernels, >> - * RCU readers leave no in-memory hints, which means that CPU-bound >> - * processes executing in kernel context might never report an >> - * RCU quiescent state. Therefore, the following code causes >> - * cond_resched() to report a quiescent state, but only when RCU >> - * is in urgent need of one. >> - * / >> -#ifndef CONFIG_PREEMPT_RCU >> - rcu_all_qs(); >> -#endif > > But... > > Suppose we have a long-running loop in the kernel that regularly > enables preemption, but only momentarily. Then the added > rcu_flavor_sched_clock_irq() check would almost always fail, making > for extremely long grace periods. So, my thinking was that if RCU wants to end a grace period, it would force a context switch by setting TIF_NEED_RESCHED (and as patch 38 mentions RCU always uses the the eager version) causing __schedule() to call rcu_note_context_switch(). That's similar to the preempt_schedule_common() case in the _cond_resched() above. But if I see your point, RCU might just want to register a quiescent state and for this long-running loop rcu_flavor_sched_clock_irq() does seem to fall down. > Or did I miss a change that causes preempt_enable() to help RCU out? Something like this? diff --git a/include/linux/preempt.h b/include/linux/preempt.h index dc5125b9c36b..e50f358f1548 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -222,6 +222,8 @@ do { \ barrier(); \ if (unlikely(preempt_count_dec_and_test())) \ __preempt_schedule(); \ + if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) \ + rcu_all_qs(); \ } while (0) Though I do wonder about the likelihood of hitting the case you describe and maybe instead of adding the check on every preempt_enable() it might be better to instead force a context switch in the rcu_flavor_sched_clock_irq() (as we do in the PREEMPT_RCU=y case.) Thanks -- ankur