Re: [PATCH v6 4/5] MCS Lock: Barrier corrections

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Tue, 26 Nov 2013 10:59:45 +0100

On Mon, Nov 25, 2013 at 03:52:52PM -0800, Paul E. McKenney wrote:
> On Mon, Nov 25, 2013 at 07:27:15PM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 25, 2013 at 10:02:50AM -0800, Paul E. McKenney wrote:
> > > And if the two locks are different, then the guarantee applies only
> > > when the unlock and lock are on the same CPU, in which case, as Linus
> > > noted, the xchg() on entry to the slow path does the job for use.
> > 
> > But in that case we rely on the fact that the thing is part of a
> > composite and we should no longer call it load_acquire, because frankly
> > it doesn't have acquire semantics anymore because the read can escape
> > out.
> 
> Actually, load-acquire and store-release are only required to provide
> ordering in the threads/CPUs doing the load-acquire/store-release
> operations.  It is just that we require something stronger than minimal
> load-acquire/store-release to make a Linux-kernel lock.

I suspect we're talking past one another here; but our Document
describes ACQUIRE/RELEASE semantics such that

  RELEASE
  ACQUIRE

matches a full barrier, regardless on whether it is the same lock or
not.

If you now want to weaken this definition, then that needs consideration
because we actually rely on things like

spin_unlock(l1);
spin_lock(l2);

being full barriers.

Now granted, for lock operations we have actual atomic ops in between
which would cure x86, but it would leave us confused with the barrier
semantics.

So please; either: 

A) we have the strong ACQUIRE/RELEASE semantics as currently described;
   and therefore any RELEASE+ACQUIRE pair must form a full barrier; and
   our propose primitives are non-compliant and needs strengthening.

B) we go fudge about with the definitions.

But given the current description of our ACQUIRE barrier, we simply
cannot claim the proposed primitives are good on x86 IMO.

Also, instead of the smp_store_release() I would argue that
smp_load_acquire() is the one that needs the full buffer, even on PPC.

Because our ACQUIRE dis-allows loads/stores leaking out upwards, and
both TSO and PPC lwsync allow just that, so the smp_load_acquire() is
the one that needs the full barrier.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>