Le Fri, Feb 14, 2025 at 01:01:56AM -0800, Paul E. McKenney a écrit : > On Fri, Feb 14, 2025 at 12:25:59AM +0100, Frederic Weisbecker wrote: > > A CPU coming online checks for an ongoing grace period and reports > > a quiescent state accordingly if needed. This special treatment that > > shortcuts the expedited IPI finds its origin as an optimization purpose > > on the following commit: > > > > 338b0f760e84 (rcu: Better hotplug handling for synchronize_sched_expedited() > > > > The point is to avoid an IPI while waiting for a CPU to become online > > or failing to become offline. > > > > However this is pointless and even error prone for several reasons: > > > > * If the CPU has been seen offline in the first round scanning offline > > and idle CPUs, no IPI is even tried and the quiescent state is > > reported on behalf of the CPU. > > > > * This means that if the IPI fails, the CPU just became offline. So > > it's unlikely to become online right away, unless the cpu hotplug > > operation failed and rolled back, which is a rare event that can > > wait a jiffy for a new IPI to be issued. > > > > * But then the "optimization" applying on failing CPU hotplug down only > > applies to !PREEMPT_RCU. > > > > * This force reports a quiescent state even if ->cpu_no_qs.b.exp is not > > set. As a result it can race with remote QS reports on the same rdp. > > Fortunately it happens to be OK but an accident is waiting to happen. > > > > For all those reasons, remove this optimization that doesn't look worthy > > to keep around. > > Thank you for digging into this! > > When I ran tests that removed the call to sync_sched_exp_online_cleanup() > a few months ago, I got grace-period hangs [1]. Has something changed > to make this safe? Hmm, but was it before or after "rcu: Fix get_state_synchronize_rcu_full() GP-start detection" ? And if after do we know why? Thanks.