> >> > @@ -773,7 +773,12 @@ static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp) > >> > */ > >> > static int dyntick_save_progress_counter(struct rcu_data *rdp) > >> > { > >> > - rdp->dynticks_snap = rcu_dynticks_snap(rdp->cpu); > >> > >> So for PPC, which gets the smp_mb() at the lock acquisition, this is an > >> "obvious" redundant smp_mb(). > >> > >> For the other archs, per the definition of smp_mb__after_unlock_lock() it > >> seems implied that UNLOCK+LOCK is a full memory barrier, but I wanted to > >> see it explicitly stated somewhere. From a bit of spelunking below I still > >> think it's the case, but is there a "better" source of truth? > >> > >> 01352fb81658 ("locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier") > >> """ > >> The Linux kernel has traditionally required that an UNLOCK+LOCK pair act as a > >> full memory barrier when either (1) that UNLOCK+LOCK pair was executed by the > >> same CPU or task, or (2) the same lock variable was used for the UNLOCK and > >> LOCK. > >> """ > >> > >> and > >> > >> https://lore.kernel.org/all/1436789704-10086-1-git-send-email-will.deacon@xxxxxxx/ > >> """ > >> This ordering guarantee is already provided without the barrier on > >> all architectures apart from PowerPC > >> """ > > > > You seem to have found the accurate informations! But I must admit > > they are hard to find and it would be welcome to document that properly, for example > > in Documentation/memory-barriers.txt > > > > I think the reason is that it's not supposed to be used outside RCU, perhaps > > because its semantics are too fragile to use for general purpose? Even that > > could be stated along in Documentation/memory-barriers.txt > > > > That's also what I suspected when I stumbled on > > 12d560f4ea87 ("rcu,locking: Privatize smp_mb__after_unlock_lock()") > > which removed the references to it from Documentation/memory-barriers.txt > > > Another thing is that its semantics are similar to smp_mb__after_spinlock() > > (itself badly documented), although slightly different. I'm not even completely > > sure how. I assume that smp_mb__after_spinlock() can be just used once to > > produce the required ordering and subsequent lock on that spinlock don't need > > to repeat the barrier to propagate the ordering against what is before the > > smp_mb__after_spinlock. However IUUC smp_mb__after_unlock_lock() has to be > > chained/repeated on all subsequent locking of the spinlock... > > IIUC (big if) the chaining is a requirement of RCU itself, per: > > 2a67e741bbbc ("rcu: Create transitive rnp->lock acquisition functions") > > * Because the rcu_nodes form a tree, the tree traversal locking will observe > * different lock values, this in turn means that an UNLOCK of one level > * followed by a LOCK of another level does not imply a full memory barrier; > * and most importantly transitivity is lost. > * > * In order to restore full ordering between tree levels, augment the regular > * lock acquire functions with smp_mb__after_unlock_lock(). > I know my remark may seem a little biased, ;-) but the semantics of smp_mb__after_unlock_lock() and smp_mb__after_spinlock() have been somehowr/formally documented in the LKMM. This means, in particular, that one can write "litmus tests" with the barriers at stake and then "run"/check such tests against the _current model. For example, (based on inline comments in include/linux/spinlock.h) $ cat after_spinlock.litmus C after_spinlock { } P0(int *x, spinlock_t *s) { spin_lock(s); WRITE_ONCE(*x, 1); spin_unlock(s); } P1(int *x, int *y, spinlock_t *s) { int r0; spin_lock(s); smp_mb__after_spinlock(); r0 = READ_ONCE(*x); WRITE_ONCE(*y, 1); spin_unlock(s); } P2(int *x, int *y) { int r1; int r2; r1 = READ_ONCE(*y); smp_rmb(); r2 = READ_ONCE(*x); } exists (1:r0=1 /\ 2:r1=1 /\ 2:r2=0) $ herd7 -conf linux-kernel.cfg after_spinlock.litmus Test after_spinlock Allowed States 7 1:r0=0; 2:r1=0; 2:r2=0; 1:r0=0; 2:r1=0; 2:r2=1; 1:r0=0; 2:r1=1; 2:r2=0; 1:r0=0; 2:r1=1; 2:r2=1; 1:r0=1; 2:r1=0; 2:r2=0; 1:r0=1; 2:r1=0; 2:r2=1; 1:r0=1; 2:r1=1; 2:r2=1; No Witnesses Positive: 0 Negative: 7 Condition exists (1:r0=1 /\ 2:r1=1 /\ 2:r2=0) Observation after_spinlock Never 0 7 Time after_spinlock 0.01 Hash=b377bde8fe3565fcdd0eb2bdfaf3351e Notice that, according to the current model at least, the state in the above "exists" clause remains forbidden _after removal of the smp_mb__after_spinlock() barrier. In this sense, if you want, the inline comment (I contributed to) is misleading/incomplete. :-/ Andrea