On Thu, Jul 29, 2021 at 12:53:31PM +0200, Frederic Weisbecker wrote: > On Thu, Jul 29, 2021 at 03:58:04PM +0800, Boqun Feng wrote: > > > The following litmus test, also adapted from the one supplied off-list > > > by Frederic Weisbecker, models the RCU grace-period kthread detecting > > > a non-idle CPU that is concurrently transitioning to idle: > > > > > > C dynticks-into-idle > > > > > > { > > > DYNTICKS=1; (* Initially non-idle. *) > > > } > > > > > > P0(int *X, int *DYNTICKS) > > > { > > > int dynticks; > > > > > > // Non-idle. > > > WRITE_ONCE(*X, 1); > > > dynticks = READ_ONCE(*DYNTICKS); > > > smp_store_release(DYNTICKS, dynticks + 1); > > > smp_mb(); > > > > this smp_mb() is not needed, as we rely on the release-acquire pair to > > provide the ordering. > > > > This means that if we use different implementations (one w/ smp_mb(), > > another w/o) rcu_dynticks_inc() for idle-to-nonidle and nonidle-to-idle, > > we could save a smp_mb(). Thoughts? > > That's exactly what I wanted to propose but everybody was sober. Namely order > only the RCU read side critical sections before/after idle together: > > READ side critical section > //enter idle > //exit idle > smp_mb() > READ side critical section > > instead of ordering the RCU read side critical section before idle - with the RCU > idle extended quiescent state - with the RCU read side critical section after idle: > > READ side critical section > //enter idle > smp_mb(); > //exit idle > smp_mb() > READ side critical section > > So the side effect now is that if the write side waits for the reader to > report a quiescent state and scans its dynticks state and see it's not yet in > RCU idle mode, then later on when the read side enters in RCU idle mode we > expect it to see the write side updates. > But after the barrier removal the reader will only see the write side update > once we exit RCU idle mode. > > So the following may happen: > > P0(int *X, int *Y, int *DYNTICKS) > { > int y; > > WRITE_ONCE(*X, 1); > smp_store_release(DYNTICKS, 1); // rcu_eqs_enter > //smp_mb() not there anymore > y = READ_ONCE(*Y); > smp_store_release(DYNTICKS, 2); // rcu_eqs_exit() > smp_mb(); > } > > P1(int *X, int *Y, int *DYNTICKS) > { > int x; > int dynticks; > > WRITE_ONCE(*Y, 1); > smp_mb(); > dynticks = smp_load_acquire(DYNTICKS); > x = READ_ONCE(*X); > } > > exists (1:x=0 /\ 0:y=0) > Thanks for the detailed explanation ;-) > Theoretically it shouldn't matter because the RCU idle mode isn't > supposed to perform RCU reads. But theoretically again once a CPU Right, in LOCKDEP=y kernel, rcu_read_lock_held() requires rcu_is_watching(), so rcu_dereference() is not allowed in idle mode, unless using RCU_NONIDLE() or rcu_irq_enter_irqson() to temporarily exit the idle mode. > has reported a quiescent state, any further read is expected to see > the latest updates from the write side. Yes, but in your above case, doesn't P0 already reach to a quiescent state even before WRITE_ONCE()? IOW, that case is similar to the following: P0(int *X, int *Y) { // in QS WRITE_ONCE(*X, 1); y = READ_ONCE(*Y); } P1(int *X, int *Y) { WRITE_ONCE(*Y, 1); synchronize_rcu(); x = READ_ONCE(*X); } exists (1:x=0 /\ 0:y=0) And RCU doesn't guarantee the READ_ONCE() on P0 sees the WRITE_ONCE() on P1. > > So I don't know what to think. In practice I believe it's not a big deal > because RCU idle mode code is usually a fragile path that just handles > cpuidle code to put the CPU in/out low power mode. But what about dragons... My current thought is that if the cpuidle code requires some ordering with synchronize_rcu(), RCU_NONIDLE() should be used, and ordering can be guaranteed in this case (RCU_NONIDLE() has a rcu_eqs_exit() in it). Otherwise, it's a bug. So looks like we can drop that smp_mb() in rcu_eqs_enter()? At least, we can say something in the doc to prevent people from relying on the ordering between normal reads in RCU idle mode and synchronize_rcu(). Thoughts? Regards, Boqun