On Fri, Jul 30, 2021 at 01:56:41PM +0800, Boqun Feng wrote: > On Thu, Jul 29, 2021 at 12:53:31PM +0200, Frederic Weisbecker wrote: > > On Thu, Jul 29, 2021 at 03:58:04PM +0800, Boqun Feng wrote: > > > > The following litmus test, also adapted from the one supplied off-list > > > > by Frederic Weisbecker, models the RCU grace-period kthread detecting > > > > a non-idle CPU that is concurrently transitioning to idle: > > > > > > > > C dynticks-into-idle > > > > > > > > { > > > > DYNTICKS=1; (* Initially non-idle. *) > > > > } > > > > > > > > P0(int *X, int *DYNTICKS) > > > > { > > > > int dynticks; > > > > > > > > // Non-idle. > > > > WRITE_ONCE(*X, 1); > > > > dynticks = READ_ONCE(*DYNTICKS); > > > > smp_store_release(DYNTICKS, dynticks + 1); > > > > smp_mb(); > > > > > > this smp_mb() is not needed, as we rely on the release-acquire pair to > > > provide the ordering. > > > > > > This means that if we use different implementations (one w/ smp_mb(), > > > another w/o) rcu_dynticks_inc() for idle-to-nonidle and nonidle-to-idle, > > > we could save a smp_mb(). Thoughts? > > > > That's exactly what I wanted to propose but everybody was sober. Namely order > > only the RCU read side critical sections before/after idle together: > > > > READ side critical section > > //enter idle > > //exit idle > > smp_mb() > > READ side critical section > > > > instead of ordering the RCU read side critical section before idle - with the RCU > > idle extended quiescent state - with the RCU read side critical section after idle: > > > > READ side critical section > > //enter idle > > smp_mb(); > > //exit idle > > smp_mb() > > READ side critical section > > > > So the side effect now is that if the write side waits for the reader to > > report a quiescent state and scans its dynticks state and see it's not yet in > > RCU idle mode, then later on when the read side enters in RCU idle mode we > > expect it to see the write side updates. > > But after the barrier removal the reader will only see the write side update > > once we exit RCU idle mode. > > > > So the following may happen: > > > > P0(int *X, int *Y, int *DYNTICKS) > > { > > int y; > > > > WRITE_ONCE(*X, 1); > > smp_store_release(DYNTICKS, 1); // rcu_eqs_enter > > //smp_mb() not there anymore > > y = READ_ONCE(*Y); > > smp_store_release(DYNTICKS, 2); // rcu_eqs_exit() > > smp_mb(); > > } > > > > P1(int *X, int *Y, int *DYNTICKS) > > { > > int x; > > int dynticks; > > > > WRITE_ONCE(*Y, 1); > > smp_mb(); > > dynticks = smp_load_acquire(DYNTICKS); > > x = READ_ONCE(*X); > > } > > > > exists (1:x=0 /\ 0:y=0) > > > > Thanks for the detailed explanation ;-) > > > Theoretically it shouldn't matter because the RCU idle mode isn't > > supposed to perform RCU reads. But theoretically again once a CPU > > Right, in LOCKDEP=y kernel, rcu_read_lock_held() requires > rcu_is_watching(), so rcu_dereference() is not allowed in idle mode, > unless using RCU_NONIDLE() or rcu_irq_enter_irqson() to temporarily exit > the idle mode. > > > has reported a quiescent state, any further read is expected to see > > the latest updates from the write side. > > Yes, but in your above case, doesn't P0 already reach to a quiescent > state even before WRITE_ONCE()? IOW, that case is similar to the > following: > > P0(int *X, int *Y) > { > // in QS > > WRITE_ONCE(*X, 1); > y = READ_ONCE(*Y); > } > > P1(int *X, int *Y) > { > WRITE_ONCE(*Y, 1); > synchronize_rcu(); > x = READ_ONCE(*X); > } > > exists (1:x=0 /\ 0:y=0) > > And RCU doesn't guarantee the READ_ONCE() on P0 sees the WRITE_ONCE() on > P1. > > > > > So I don't know what to think. In practice I believe it's not a big deal > > because RCU idle mode code is usually a fragile path that just handles > > cpuidle code to put the CPU in/out low power mode. But what about dragons... > > My current thought is that if the cpuidle code requires some ordering > with synchronize_rcu(), RCU_NONIDLE() should be used, and ordering can > be guaranteed in this case (RCU_NONIDLE() has a rcu_eqs_exit() in it). > Otherwise, it's a bug. > > So looks like we can drop that smp_mb() in rcu_eqs_enter()? At least, we > can say something in the doc to prevent people from relying on the > ordering between normal reads in RCU idle mode and synchronize_rcu(). > > Thoughts? Is there a benchmark that can show a system-level difference? My guess is that the realtime interrupt-latency and scheduler-latency benchmarks would have the best chance of seeing this. Thanx, Paul