Paul! On Mon, Nov 20 2023 at 16:38, Paul E. McKenney wrote: > But... > > Suppose we have a long-running loop in the kernel that regularly > enables preemption, but only momentarily. Then the added > rcu_flavor_sched_clock_irq() check would almost always fail, making > for extremely long grace periods. Or did I miss a change that causes > preempt_enable() to help RCU out? So first of all this is not any different from today and even with RCU_PREEMPT=y a tight loop: do { preempt_disable(); do_stuff(); preempt_enable(); } will not allow rcu_flavor_sched_clock_irq() to detect QS reliably. All it can do is to force reschedule/preemption after some time, which in turn ends up in a QS. The current NONE/VOLUNTARY models, which imply RCU_PRREMPT=n cannot do that at all because the preempt_enable() is a NOOP and there is no preemption point at return from interrupt to kernel. do { do_stuff(); } So the only thing which makes that "work" is slapping a cond_resched() into the loop: do { do_stuff(); cond_resched(); } But the whole concept behind LAZY is that the loop will always be: do { preempt_disable(); do_stuff(); preempt_enable(); } and the preempt_enable() will always be a functional preemption point. So let's look at the simple case where more than one task is runnable on a given CPU: loop() preempt_disable(); --> tick interrupt set LAZY_NEED_RESCHED preempt_enable() -> Does nothing because NEED_RESCHED is not set preempt_disable(); --> tick interrupt set NEED_RESCHED preempt_enable() preempt_schedule() schedule() report_QS() which means that on the second tick a quiesent state is reported. Whether that's really going to be a full tick which is granted that's a scheduler decision and implementation detail and not really relevant for discussing the concept. Now the problematic case is when there is only one task runnable on a given CPU because then the tick interrupt will set neither of the preemption bits. Which is fine from a scheduler perspective, but not so much from a RCU perspective. But the whole point of LAZY is to be able to enforce rescheduling at the next possible preemption point. So RCU can utilize that too: rcu_flavor_sched_clock_irq(bool user) { if (user || rcu_is_cpu_rrupt_from_idle() || !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { rcu_qs(); return; } if (this_cpu_read(rcu_data.rcu_urgent_qs)) set_need_resched(); } So: loop() preempt_disable(); --> tick interrupt rcu_flavor_sched_clock_irq() sets NEED_RESCHED preempt_enable() preempt_schedule() schedule() report_QS() See? No magic nonsense in preempt_enable(), no cond_resched(), nothing. The above rcu_flavor_sched_clock_irq() check for rcu_data.rcu_urgent_qs is not really fundamentaly different from the check in rcu_all_gs(). The main difference is that it is bound to the tick, so the detection/action might be delayed by a tick. If that turns out to be a problem, then this stuff has far more serious issues underneath. So now you might argue that for a loop like this: do { mutex_lock(); do_stuff(); mutex_unlock(); } the ideal preemption point is post mutex_unlock(), which is where someone would mindfully (*cough*) place a cond_resched(), right? So if that turns out to matter in reality and not just by academic inspection, then we are far better off to annotate such code with: do { preempt_lazy_disable(); mutex_lock(); do_stuff(); mutex_unlock(); preempt_lazy_enable(); } and let preempt_lazy_enable() evaluate the NEED_RESCHED_LAZY bit. Then rcu_flavor_sched_clock_irq(bool user) can then use a two stage approach like the scheduler: rcu_flavor_sched_clock_irq(bool user) { if (user || rcu_is_cpu_rrupt_from_idle() || !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { rcu_qs(); return; } if (this_cpu_read(rcu_data.rcu_urgent_qs)) { if (!need_resched_lazy())) set_need_resched_lazy(); else set_need_resched(); } } But for a start I would just use the trivial if (this_cpu_read(rcu_data.rcu_urgent_qs)) set_need_resched(); approach and see where this gets us. With the approach I suggested to Ankur, i.e. having PREEMPT_AUTO(or LAZY) as a config option we can work on the details of the AUTO and RCU_PREEMPT=n flavour up to the point where we are happy to get rid of the whole zoo of config options alltogether. Just insisting that RCU_PREEMPT=n requires cond_resched() and whatsoever is not really getting us anywhere. Thanks, tglx