On Tue, Dec 20, 2022 at 10:34:19PM -0500, Mathieu Desnoyers wrote: > On 2022-12-20 17:57, Frederic Weisbecker wrote: > > On Tue, Dec 20, 2022 at 02:01:30PM -0500, Mathieu Desnoyers wrote: > > > On 2022-12-20 13:29, Joel Fernandes wrote: > > > > > > > > > > > I do want to finish my memory barrier studies of SRCU over the holidays since I have been deep in the hole with that already. Back to the post flip memory barrier here since I think now even that might not be needed… > > > > > > I strongly suspect the memory barrier after flip is useless for the same > > > reasons I mentioned explaining why the barrier before the flip is useless. > > > > > > However, we need to double-check that we have memory barriers at the > > > beginning and end of synchronize_srcu, and between load of "unlock" counters > > > and load of "lock" counters. > > > > > > Where is the barrier at the beginning of synchronize_srcu ? > > > > rcu_seq_snap() ? > > The memory barrier in rcu_seq_snap is not at the very beginning of synchronize_srcu. > > For example we have: > > unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp) > { > // Any prior manipulation of SRCU-protected data must happen > // before the load from ->srcu_gp_seq. > smp_mb(); > return rcu_seq_snap(&ssp->srcu_gp_seq); > > which happens to have an explicit barrier before issuing rcu_seq_snap(). SRCU (or even RCU) polling is special in that it may rely on a concurrent updater to start the grace period, hence why you need two more barriers here (the second is in poll_state_synchronize_srcu()) so that: * If the polling updater started polling (calling get_state_synchronize_srcu()) before the traditional updater started the grace period, the latter must propagate the changes from the polling updater to all CPUs. * If the polling updater started polling (calling get_state_synchronize_srcu()) after the traditional updater started the grace period, it must wait for a subsequent grace period (rcu_seq_snap() will return that subsequent sequence). * If the polling updater checks (and thereby finishes) polling (calling poll_state_synchronize_srcu()) after the traditional updater completes the grace period, the polling updater sees the propagated barrier. * If the polling updater checks polling (calling poll_state_synchronize_srcu()) before the traditional updater completes the grace period, keep polling. > So my question still stands: where is the memory barrier at the beginning of > synchronize_srcu ? I still think rcu_seq_snap() is what you're looking for. > > The memory ordering constraint I am concerned about here is: > > * [...] In addition, > * each CPU having an SRCU read-side critical section that extends beyond > * the return from synchronize_srcu() is guaranteed to have executed a > * full memory barrier after the beginning of synchronize_srcu() and before > * the beginning of that SRCU read-side critical section. [...] > > So if we have a SRCU read-side critical section that begins after the beginning > of synchronize_srcu, but before its first memory barrier, it would miss the > guarantee that the full memory barrier is issued before the beginning of that > SRCU read-side critical section. IOW, that memory barrier needs to be at the > very beginning of the grace period. I'm confused, what's wrong with this ? UPDATER READER ------- ------ STORE X = 1 STORE srcu_read_lock++ // rcu_seq_snap() smp_mb() smp_mb() READ X // scans READ srcu_read_lock