> So I'm not actually sure how many people rely on the RCsc transitive > smp_mb() here. People certainly rely on the RELEASE semantics, and the > code itself requires the store->load ordering, together that gives us > the smp_mb() because that's simply the only barrier we have. > > And looking at smp_mb__after_spinlock() again, we really only need the > RCsc thing for rq->lock, not for the wakeups. The wakeups really only > need that RCpc RELEASE + store->load thing (which we don't have). > > So yes, smp_mb(), however the below still makes more sense to me, or am > I just being obtuse again? While trying to integrate these remarks into v1 and looking again at the comment before smp_mb__after_spinlock(), I remembered a discussion where Boqun suggested some improvements for this comment: so I wrote the commit reported at the end of this email. This raises the following two issues: 1) First, the problem of integrating the resulting comment into v1, where I've been talking about _full_ barriers associated to the wakeups fuctions but where these are actually implemented as: spin_lock(s); smp_mb__after_spinlock(); 2) Second, the problem of formalizing the requirements described in that comment (remark: informally, the LKMM currently states that the sequence "spin_lock(s); smp_mb__after_spinlock();" generates a full barrier; in particular, this sequence orders {STORE,LOAD} -> {STORE,LOAD} according to the current LKMM). For (1), I could simply replace each occurrence of "executes a full memory barrier" with "execute the sequence spin_lock(s); smp_mb__after_spinlock()"; I haven't really thought about (2) yet, but please notice that something as simple as let mb = [...] | ([W] ; po? ; [LKW] ; fencerel(After-spinlock) ; [R]) would _not_ guarantee "RCsc transitivity" ... A different approach (that could solve both problems at once) would be to follow the current formalization in LKMM and to modify the comment before smp_mb__after_spinlock() accordingly (say, informally, "it's required that that spin_lock(); smp_mb__after_spinlock() provides a full barrier"). Thoughts? Andrea >From c3648d5022bedcd356198efa65703e01541cbd3f Mon Sep 17 00:00:00 2001 From: Andrea Parri <andrea.parri@xxxxxxxxxxxxxxxxxxxx> Date: Wed, 27 Jun 2018 10:53:30 +0200 Subject: [PATCH 2/3] locking: Fix comment for smp_mb__after_spinlock() Boqun reported that the snippet described in the header comment for smp_mb__after_spinlock() can be misleading, because acquire/release chains already provide us with the underlying guarantee (due to the A-cumulativity of release). This commit fixes the comment following Boqun's example in [1]. It's worth noting here that LKMM maintainers are currently actively debating whether to enforce RCsc transitivity of locking operations "by definition" [2]; should the guarantee be enforced in the future, the requirement for smp_mb__after_spinlock() could be simplified to include only the STORE->LOAD ordering requirement. [1] http://lkml.kernel.org/r/20180312085600.aczjkpn73axzs2sb@tardis [2] http://lkml.kernel.org/r/Pine.LNX.4.44L0.1711271553490.1424-100000@xxxxxxxxxxxxxxxxxxxx http://lkml.kernel.org/r/Pine.LNX.4.44L0.1806211322160.2381-100000@xxxxxxxxxxxxxxxxxxxx Reported-and-Suggested-by: Boqun Feng <<boqun.feng@xxxxxxxxx> Signed-off-by: Andrea Parri <andrea.parri@xxxxxxxxxxxxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Will Deacon <will.deacon@xxxxxxx> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> --- include/linux/spinlock.h | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h index 1e8a464358384..c74828fe8d75c 100644 --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -121,22 +121,22 @@ do { \ * * - it must ensure the critical section is RCsc. * - * The latter is important for cases where we observe values written by other - * CPUs in spin-loops, without barriers, while being subject to scheduling. + * The latter requirement guarantees that stores from two critical sections + * in different CPUs are ordered even outside the critical sections. As an + * example illustrating this property, consider the following snippet: * - * CPU0 CPU1 CPU2 + * CPU0 CPU1 CPU2 * - * for (;;) { - * if (READ_ONCE(X)) - * break; - * } - * X=1 - * <sched-out> - * <sched-in> - * r = X; + * spin_lock(s); spin_lock(s); r2 = READ_ONCE(Y); + * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb(); + * spin_unlock(s); r1 = READ_ONCE(X); r3 = READ_ONCE(X); + * WRITE_ONCE(Y, 1); + * spin_unlock(s); * - * without transitivity it could be that CPU1 observes X!=0 breaks the loop, - * we get migrated and CPU2 sees X==0. + * Without RCsc transitivity, it is allowed that CPU0's critical section + * precedes CPU1's critical section (r1=1) and that CPU2 observes CPU1's + * store to Y (r2=1) while it does not observe CPU0's store to X (r3=0), + * despite the presence of the smp_rmb(). * * Since most load-store architectures implement ACQUIRE with an smp_mb() after * the LL/SC loop, they need no further barriers. Similarly all our TSO -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html