Re: [PATCH v5 4/4] MCS Lock: Barrier corrections

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Tue, 19 Nov 2013 11:32:25 -0800

On Mon, Nov 11, 2013 at 08:57:27PM -0500, Waiman Long wrote:
> On 11/11/2013 04:17 PM, Tim Chen wrote:
> >>You could then augment that with [cmp]xchg_{acquire,release} as
> >>appropriate.
> >>
> >>>+/*
> >>>   * In order to acquire the lock, the caller should declare a local node and
> >>>   * pass a reference of the node to this function in addition to the lock.
> >>>   * If the lock has already been acquired, then this will proceed to spin
> >>>@@ -37,15 +62,19 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
> >>>  	node->locked = 0;
> >>>  	node->next   = NULL;
> >>>
> >>>-	prev = xchg(lock, node);
> >>>+	/* xchg() provides a memory barrier */
> >>>+	prev = xchg_acquire(lock, node);
> >>>  	if (likely(prev == NULL)) {
> >>>  		/* Lock acquired */
> >>>  		return;
> >>>  	}
> >>>  	ACCESS_ONCE(prev->next) = node;
> >>>-	smp_wmb();
> >>>-	/* Wait until the lock holder passes the lock down */
> >>>-	while (!ACCESS_ONCE(node->locked))
> >>>+	/*
> >>>+	 * Wait until the lock holder passes the lock down.
> >>>+	 * Using smp_load_acquire() provides a memory barrier that
> >>>+	 * ensures subsequent operations happen after the lock is acquired.
> >>>+	 */
> >>>+	while (!(smp_load_acquire(&node->locked)))
> >>>  		arch_mutex_cpu_relax();
> >An alternate implementation is
> >	while (!ACCESS_ONCE(node->locked))
> >		arch_mutex_cpu_relax();
> >	smp_load_acquire(&node->locked);
> >
> >Leaving the smp_load_acquire at the end to provide appropriate barrier.
> >Will that be acceptable?
> >
> >Tim
> 
> I second Tim's opinion. It will be help to have a
> smp_mb_load_acquire() function that provide a memory barrier with
> load-acquire semantic. I don't think we need one for store-release
> as that will not be in a loop.

Hmmm...  I guess the ACCESS_ONCE() in the smp_load_acquire() should
prevent it from being optimized away.  But yes, you then end up with
an extra load on the critical lock hand-off patch.  And something
like an smp_mb_acquire() could then be useful, although I believe
that on all current hardware smp_mb_acquire() emits the same code
as would an smp_mb_release():

o	barrier() on TSO systems such as x86 and s390.

o	lwsync instruction on powerpc.  (Really old systems would
	want a different instruction for smp_mb_acquire(), but let's
	not optimize for really old systems.)

o	dmb instruction on ARM.

o	mf instruction on ia64.

So how about an smp_mb_acquire_release() to cover both use cases?
This could be used to further optimize circular buffers, for example.

							Thanx, Paul

> Peter, what do you think about adding that to your patch?
> 
> -Longman
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>