On Wed, Jan 27, 2016 at 06:22:04PM +0000, Will Deacon wrote: > As much as we'd like to live in a world where RELEASE -> ACQUIRE is > always cheaply ordered and can be used to construct UNLOCK -> LOCK > definitions with similar guarantees, the grim reality is that this isn't > even possible on x86 (thanks to Paul for bringing us crashing down to > Earth). > > This patch handles the issue by introducing a new barrier macro, > smp_mb__after_release_acquire, that can be placed after an ACQUIRE that > either reads from a RELEASE or is in program-order after a RELEASE. The > barrier upgrades the RELEASE-ACQUIRE pair to a full memory barrier, > implying global transitivity. At the moment, it doesn't have any users, > so its existence serves mainly as a documentation aid and a potential > stepping stone to the reintroduction of smp_mb__after_unlock_lock() used > by RCU. > > Documentation/memory-barriers.txt is updated to describe more clearly > the ACQUIRE and RELEASE ordering in this area and to show some examples > of the new barrier in action. > Maybe also add an entry in "CPU MEMORY BARRIERS" section of memory-barriers.txt? Something like (copy and paste from you commit log ;-)): (*) smp_mb__after_release_acquire(); Placed after an ACQUIRE that either reads from a RELEASE or is in program-order after a RELEASE. The barrier upgrades the RELEASE-ACQUIRE pair to a full memory barrier, implying global transitivity. This could give the readers an overview of the usage of this barrier. > Cc: Boqun Feng <boqun.feng@xxxxxxxxx> > Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Signed-off-by: Will Deacon <will.deacon@xxxxxxx> > --- > > Based on Paul's patch to describe local vs global transitivity: > > http://lkml.kernel.org/r/20160115173912.GU3818@xxxxxxxxxxxxxxxxxx > > Documentation/memory-barriers.txt | 63 +++++++++++++++++++++++++++++++++---- > arch/ia64/include/asm/barrier.h | 2 ++ > arch/powerpc/include/asm/barrier.h | 3 +- > arch/s390/include/asm/barrier.h | 1 + > arch/sparc/include/asm/barrier_64.h | 5 +-- > arch/x86/include/asm/barrier.h | 2 ++ > include/asm-generic/barrier.h | 13 ++++++++ > 7 files changed, 80 insertions(+), 9 deletions(-) > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > index 2dc4ba7c8d4d..62ce096b88fa 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -454,16 +454,22 @@ And a couple of implicit varieties: > The use of ACQUIRE and RELEASE operations generally precludes the need > for other sorts of memory barrier (but note the exceptions mentioned in > the subsection "MMIO write barrier"). In addition, a RELEASE+ACQUIRE > - pair is -not- guaranteed to act as a full memory barrier. However, after > + pair is -not- guaranteed to act as a full memory barrier without an > + explicit smp_mb__after_release_acquire() barrier. However, after > an ACQUIRE on a given variable, all memory accesses preceding any prior > RELEASE on that same variable are guaranteed to be visible. In other > - words, within a given variable's critical section, all accesses of all > - previous critical sections for that variable are guaranteed to have > - completed. > + words, for a CPU executing within a given variable's critical section, > + all accesses of all previous critical sections for that variable are > + guaranteed to be visible to that CPU. > > This means that ACQUIRE acts as a minimal "acquire" operation and > RELEASE acts as a minimal "release" operation. > > +A subset of the atomic operations described in atomic_ops.txt have ACQUIRE > +and RELEASE variants in addition to fully-ordered and relaxed (no barrier > +semantics) definitions. For compound atomics performing both a load and > +a store, ACQUIRE semantics apply only to the load and RELEASE semantics > +apply only to the store portion of the operation. > > Memory barriers are only required where there's a possibility of interaction > between two CPUs or between a CPU and a device. If it can be guaranteed that > @@ -1357,7 +1363,7 @@ However, the transitivity of release-acquire is local to the participating > CPUs and does not apply to cpu3(). Therefore, the following outcome > is possible: > > - r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 > + r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 && r5 == 1 > > Although cpu0(), cpu1(), and cpu2() will see their respective reads and > writes in order, CPUs not involved in the release-acquire chain might > @@ -1369,10 +1375,27 @@ store to u as happening -after- cpu1()'s load from v, even though > both cpu0() and cpu1() agree that these two operations occurred in the > intended order. > > +This can be forbidden by upgrading the release-acquire relationship > +between cpu0() and cpu1() to a full barrier using > +smp_mb__after_release_acquire() to enforce global transitivity: > + > + void cpu1(void) > + { > + r1 = smp_load_acquire(&y); > + smp_mb__after_release_acquire(); > + r4 = READ_ONCE(v); > + r5 = READ_ONCE(u); > + smp_store_release(&z, 1); > + } > + > +With this addition, the previous result is forbidden and, as long as > +r1 == 1, all CPUs must agree that cpu0()'s store to u happened before > +cpu1()'s read from v. > + > However, please keep in mind that smp_load_acquire() is not magic. > In particular, it simply reads from its argument with ordering. It does > -not- ensure that any particular value will be read. Therefore, the > -following outcome is possible: > +following outcome is possible irrespective of any additional barriers: > > r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0 > > @@ -1971,6 +1994,34 @@ the RELEASE would simply complete, thereby avoiding the deadlock. > a sleep-unlock race, but the locking primitive needs to resolve > such races properly in any case. > > +Where the RELEASE and ACQUIRE operations are performed by the same CPU > +to different addresses, ordering can be enforced by an > +smp_mb__after_release_acquire() barrier: > + > + *A = a; > + RELEASE M > + ACQUIRE N > + smp_mb__after_release_acquire(); > + *B = b; > + > +in which case, the only permitted orderings are: > + > + STORE *A, RELEASE M, ACQUIRE N, STORE *B > + STORE *A, ACQUIRE N, RELEASE M, STORE *B > + > +Similarly, smp_mb__after_release_acquire() enforces full, > +globally-transitive ordering in the case that an ACQUIRE operation on > +one CPU reads from a RELEASE operation on another: > + > + CPU 1 CPU 2 CPU 3 > + ======================= ======================= =============== > + { X = 0, Y = 0, M = 0 } > + STORE X=1 ACQUIRE M==1 STORE Y=1 > + RELEASE M=1 smp_mb__after_release_acquire() smp_mb(); > + LOAD Y==0 LOAD X=0 > + > +This outcome is forbidden (i.e. CPU 3 must read X == 1). > + > Locks and semaphores may not provide any guarantee of ordering on UP compiled > systems, and so cannot be counted on in such a situation to actually achieve > anything at all - especially with respect to I/O accesses - unless combined > diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h > index 588f1614cafc..ed803c84bd19 100644 > --- a/arch/ia64/include/asm/barrier.h > +++ b/arch/ia64/include/asm/barrier.h > @@ -67,6 +67,8 @@ do { \ > ___p1; \ > }) > > +#define __smp_mb__after_release_acquire() __smp_mb() > + > /* > * The group barrier in front of the rsm & ssm are necessary to ensure > * that none of the previous instructions in the same group are > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h > index c0deafc212b8..b12860b8c7e4 100644 > --- a/arch/powerpc/include/asm/barrier.h > +++ b/arch/powerpc/include/asm/barrier.h > @@ -74,7 +74,8 @@ do { \ > ___p1; \ > }) > > -#define smp_mb__before_spinlock() smp_mb() > +#define smp_mb__after_release_acquire() __smp_mb() > +#define smp_mb__before_spinlock() smp_mb() > > #include <asm-generic/barrier.h> > > diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h > index 5c8db3ce61c8..ee31da604b11 100644 > --- a/arch/s390/include/asm/barrier.h > +++ b/arch/s390/include/asm/barrier.h > @@ -45,6 +45,7 @@ do { \ > ___p1; \ > }) > > +#define __smp_mb__release_acquire() __smp_mb() Should be __smp_mb__after_release_acquire(), so is the title of this patch ;-) Regards, Boqun > #define __smp_mb__before_atomic() barrier() > #define __smp_mb__after_atomic() barrier() > > diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h > index c9f6ee64f41d..68c9e931a933 100644 > --- a/arch/sparc/include/asm/barrier_64.h > +++ b/arch/sparc/include/asm/barrier_64.h > @@ -52,8 +52,9 @@ do { \ > ___p1; \ > }) > > -#define __smp_mb__before_atomic() barrier() > -#define __smp_mb__after_atomic() barrier() > +#define __smp_mb__after_release_acquire() __smp_mb() > +#define __smp_mb__before_atomic() barrier() > +#define __smp_mb__after_atomic() barrier() > > #include <asm-generic/barrier.h> > > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h > index a584e1c50918..b24dcb7e806a 100644 > --- a/arch/x86/include/asm/barrier.h > +++ b/arch/x86/include/asm/barrier.h > @@ -77,6 +77,8 @@ do { \ > > #endif > > +#define __smp_mb__after_release_acquire() __smp_mb() > + > /* Atomic operations are already serializing on x86 */ > #define __smp_mb__before_atomic() barrier() > #define __smp_mb__after_atomic() barrier() > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h > index 1cceca146905..895f5993d341 100644 > --- a/include/asm-generic/barrier.h > +++ b/include/asm-generic/barrier.h > @@ -139,6 +139,10 @@ do { \ > }) > #endif > > +#ifndef __smp_mb__after_release_acquire > +#define __smp_mb__after_release_acquire() do { } while (0) > +#endif > + > #ifdef CONFIG_SMP > > #ifndef smp_store_mb > @@ -161,6 +165,10 @@ do { \ > #define smp_load_acquire(p) __smp_load_acquire(p) > #endif > > +#ifndef smp_mb__after_release_acquire > +#define smp_mb__after_release_acquire() __smp_mb__after_release_acquire() > +#endif > + > #else /* !CONFIG_SMP */ > > #ifndef smp_store_mb > @@ -194,6 +202,10 @@ do { \ > }) > #endif > > +#ifndef smp_mb__after_release_acquire > +#define smp_mb__after_release_acquire() do { } while (0) > +#endif > + > #endif > > /* Barriers for virtual machine guests when talking to an SMP host */ > @@ -206,6 +218,7 @@ do { \ > #define virt_mb__after_atomic() __smp_mb__after_atomic() > #define virt_store_release(p, v) __smp_store_release(p, v) > #define virt_load_acquire(p) __smp_load_acquire(p) > +#define virt_mb__after_release_acquire() __smp_mb__after_release_acquire() > > #endif /* !__ASSEMBLY__ */ > #endif /* __ASM_GENERIC_BARRIER_H */ > -- > 2.1.4 >
Attachment:
signature.asc
Description: PGP signature