On Thu, Jul 29, 2021 at 03:58:04PM +0800, Boqun Feng wrote: > > The following litmus test, also adapted from the one supplied off-list > > by Frederic Weisbecker, models the RCU grace-period kthread detecting > > a non-idle CPU that is concurrently transitioning to idle: > > > > C dynticks-into-idle > > > > { > > DYNTICKS=1; (* Initially non-idle. *) > > } > > > > P0(int *X, int *DYNTICKS) > > { > > int dynticks; > > > > // Non-idle. > > WRITE_ONCE(*X, 1); > > dynticks = READ_ONCE(*DYNTICKS); > > smp_store_release(DYNTICKS, dynticks + 1); > > smp_mb(); > > this smp_mb() is not needed, as we rely on the release-acquire pair to > provide the ordering. > > This means that if we use different implementations (one w/ smp_mb(), > another w/o) rcu_dynticks_inc() for idle-to-nonidle and nonidle-to-idle, > we could save a smp_mb(). Thoughts? That's exactly what I wanted to propose but everybody was sober. Namely order only the RCU read side critical sections before/after idle together: READ side critical section //enter idle //exit idle smp_mb() READ side critical section instead of ordering the RCU read side critical section before idle - with the RCU idle extended quiescent state - with the RCU read side critical section after idle: READ side critical section //enter idle smp_mb(); //exit idle smp_mb() READ side critical section So the side effect now is that if the write side waits for the reader to report a quiescent state and scans its dynticks state and see it's not yet in RCU idle mode, then later on when the read side enters in RCU idle mode we expect it to see the write side updates. But after the barrier removal the reader will only see the write side update once we exit RCU idle mode. So the following may happen: P0(int *X, int *Y, int *DYNTICKS) { int y; WRITE_ONCE(*X, 1); smp_store_release(DYNTICKS, 1); // rcu_eqs_enter //smp_mb() not there anymore y = READ_ONCE(*Y); smp_store_release(DYNTICKS, 2); // rcu_eqs_exit() smp_mb(); } P1(int *X, int *Y, int *DYNTICKS) { int x; int dynticks; WRITE_ONCE(*Y, 1); smp_mb(); dynticks = smp_load_acquire(DYNTICKS); x = READ_ONCE(*X); } exists (1:x=0 /\ 0:y=0) Theoretically it shouldn't matter because the RCU idle mode isn't supposed to perform RCU reads. But theoretically again once a CPU has reported a quiescent state, any further read is expected to see the latest updates from the write side. So I don't know what to think. In practice I believe it's not a big deal because RCU idle mode code is usually a fragile path that just handles cpuidle code to put the CPU in/out low power mode. But what about dragons...