Re: [PATCH -v4 5/7] locking, arch: Update spin_unlock_wait()

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Tue, 7 Jun 2016 19:36:54 +0200

On Tue, Jun 07, 2016 at 08:45:53PM +0800, Boqun Feng wrote:
> On Tue, Jun 07, 2016 at 02:00:16PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 07, 2016 at 07:43:15PM +0800, Boqun Feng wrote:
> > > On Mon, Jun 06, 2016 at 06:08:36PM +0200, Peter Zijlstra wrote:
> > > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> > > > index ce2f75e32ae1..e1c29d352e0e 100644
> > > > --- a/kernel/locking/qspinlock.c
> > > > +++ b/kernel/locking/qspinlock.c
> > > > @@ -395,6 +395,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> > > >  	 * pending stuff.
> > > >  	 *
> > > >  	 * p,*,* -> n,*,*
> > > > +	 *
> > > > +	 * RELEASE, such that the stores to @node must be complete.
> > > >  	 */
> > > >  	old = xchg_tail(lock, tail);
> > > >  	next = NULL;
> > > > @@ -405,6 +407,15 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> > > >  	 */
> > > >  	if (old & _Q_TAIL_MASK) {
> > > >  		prev = decode_tail(old);
> > > > +		/*
> > > > +		 * The above xchg_tail() is also load of @lock which generates,
> > > > +		 * through decode_tail(), a pointer.
> > > > +		 *
> > > > +		 * The address dependency matches the RELEASE of xchg_tail()
> > > > +		 * such that the access to @prev must happen after.
> > > > +		 */
> > > > +		smp_read_barrier_depends();
> > > 
> > > Should this barrier be put before decode_tail()? Because it's the
> > > dependency old -> prev that we want to protect here.
> > 
> > I don't think it matters one way or the other. The old->prev
> > transformation is pure; it doesn't depend on any state other than old.
> > 
> 
> But wouldn't the old -> prev transformation get broken on Alpha
> semantically in theory? Because Alpha could reorder the LOAD part of the
> xchg_tail() and decode_tail(), which results in prev points to other
> cpu's mcs_node.

No; I don't think this is possible. The thing Alpha needs the barrier
for is to force its two cache halves into sync, such that dependent
loads are guaranteed ordered.

There is only a single load of @old, there is no second load; the
transform into @prev is a 'pure' function:

  https://en.wikipedia.org/wiki/Pure_function

(even if the transformation needs to load data; that data is static, it
never changes, and therefore it is impossible to observe a stale value).

> Though, this is fine in current code, because xchg_release() is actually
> xchg() on Alpha, and Alpha doesn't use qspinlock.

I actually have a patch that converts Alpha to use relaxed atomics :-)
But yeah, the point is entirely moot for Alpha not using qspinlock.

> > I put it between prev and dereferences of prev, because that's what made
> > most sense to me; but really anywhere between the load of @old and the
> > first dereference of @prev is fine I suspect.
> 
> I understand the barrier here serves more for a documentation purpose,
> and I don't want to a paranoid ;-) I'm fine with the current place, just
> thought we could put it at somewhere more conforming to our memory
> model.

I think it is in accordance, there is the load of @old and there are the
loads of dereferenced @prev, between those a barrier needs to be placed.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html