On Wed, Sep 07, 2016 at 01:24:23PM +0200, Peter Zijlstra wrote: > > +/* > > + * Ordering barriers: > > + * - Every synchronizable specified memory instruction (loads or stores or both) > > + * that occurs in the instruction stream before the SYNC instruction must > > + * reach a stage in the load/store datapath after which no instruction > > + * re-ordering is possible before any synchronizable specified memory > > + * instruction which occurs after the SYNC instruction in the instruction > > + * stream reaches the same stage in the load/store datapath. > > + * > > + * - If any memory instruction before the SYNC instruction in program order, > > + * generates a memory request to the external memory and any memory > > + * instruction after the SYNC instruction in program order also generates a > > + * memory request to external memory, the memory request belonging to the > > + * older instruction must be globally performed before the time the memory > > + * request belonging to the younger instruction is globally performed. > > + * > > + * - The barrier does not guarantee the order in which instruction fetches are > > + * performed. > > + */ > > + > > +/* > > + * stype 0x10 - An ordering barrier that affects preceding loads and stores and > > + * subsequent loads and stores. > > + * Older instructions which must reach the load/store ordering point before the > > + * SYNC instruction completes: Loads, Stores > > + * Younger instructions which must reach the load/store ordering point only > > + * after the SYNC instruction completes: Loads, Stores > > + * Older instructions which must be globally performed when the SYNC instruction > > + * completes: N/A > > + */ > > +#define STYPE_SYNC_MB 0x10 > > This I'm not sure of; it states that things must become globally visible > in the order specified, but the wording leaves a fairly big hole. It > doesn't state that things cannot be less than globally visible at > intermediate times. > > To take the example from Documentation/memory-barriers.txt: > > CPU 1 CPU 2 CPU 3 > ======================= ======================= ======================= > { X = 0, Y = 0 } > STORE X=1 LOAD X STORE Y=1 > <general barrier> <general barrier> > LOAD Y LOAD X > > Suppose that CPU 2's load from X returns 1 and its load from Y returns 0. > This indicates that CPU 2's load from X in some sense follows CPU 1's > store to X and that CPU 2's load from Y in some sense preceded CPU 3's > store to Y. The question is then "Can CPU 3's load from X return 0?" > > > Is it ever possible for CPU2 and CPU3 to match "SYNC 10" points but to > disagree on their loads of X? > > That is, even though CPU2 and CPU3 agree on their respective past and > future stores, the 'happens before' relation CPU1 and CPU2 have wrt. X > is not included? > Now, I suspect it _is_ transitive, because CPU2's "LOAD X" must be globally performed wrt CPU3's "LOAD X", and my interpretation of that means that the STORE of X must be globally visible for that to be true. But, like said, wording... so clarification would be grand. Also, IFF "SYNC 10" is indeed transitive, you should be able to replace smp_mb() with it unconditionally.