Re: [PATCH v5 4/4] MCS Lock: Barrier corrections

Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> · Tue, 19 Nov 2013 13:45:03 -0800

On Tue, 2013-11-19 at 11:32 -0800, Paul E. McKenney wrote:
> On Mon, Nov 11, 2013 at 08:57:27PM -0500, Waiman Long wrote:
> > On 11/11/2013 04:17 PM, Tim Chen wrote:
> > >>You could then augment that with [cmp]xchg_{acquire,release} as
> > >>appropriate.
> > >>
> > >>>+/*
> > >>>   * In order to acquire the lock, the caller should declare a local node and
> > >>>   * pass a reference of the node to this function in addition to the lock.
> > >>>   * If the lock has already been acquired, then this will proceed to spin
> > >>>@@ -37,15 +62,19 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
> > >>>  	node->locked = 0;
> > >>>  	node->next   = NULL;
> > >>>
> > >>>-	prev = xchg(lock, node);
> > >>>+	/* xchg() provides a memory barrier */
> > >>>+	prev = xchg_acquire(lock, node);
> > >>>  	if (likely(prev == NULL)) {
> > >>>  		/* Lock acquired */
> > >>>  		return;
> > >>>  	}
> > >>>  	ACCESS_ONCE(prev->next) = node;
> > >>>-	smp_wmb();
> > >>>-	/* Wait until the lock holder passes the lock down */
> > >>>-	while (!ACCESS_ONCE(node->locked))
> > >>>+	/*
> > >>>+	 * Wait until the lock holder passes the lock down.
> > >>>+	 * Using smp_load_acquire() provides a memory barrier that
> > >>>+	 * ensures subsequent operations happen after the lock is acquired.
> > >>>+	 */
> > >>>+	while (!(smp_load_acquire(&node->locked)))
> > >>>  		arch_mutex_cpu_relax();
> > >An alternate implementation is
> > >	while (!ACCESS_ONCE(node->locked))
> > >		arch_mutex_cpu_relax();
> > >	smp_load_acquire(&node->locked);
> > >
> > >Leaving the smp_load_acquire at the end to provide appropriate barrier.
> > >Will that be acceptable?
> > >
> > >Tim
> > 
> > I second Tim's opinion. It will be help to have a
> > smp_mb_load_acquire() function that provide a memory barrier with
> > load-acquire semantic. I don't think we need one for store-release
> > as that will not be in a loop.
> 
> Hmmm...  I guess the ACCESS_ONCE() in the smp_load_acquire() should
> prevent it from being optimized away.  But yes, you then end up with
> an extra load on the critical lock hand-off patch.  And something
> like an smp_mb_acquire() could then be useful, although I believe
> that on all current hardware smp_mb_acquire() emits the same code
> as would an smp_mb_release():
> 
> o	barrier() on TSO systems such as x86 and s390.
> 
> o	lwsync instruction on powerpc.  (Really old systems would
> 	want a different instruction for smp_mb_acquire(), but let's
> 	not optimize for really old systems.)
> 
> o	dmb instruction on ARM.
> 
> o	mf instruction on ia64.
> 
> So how about an smp_mb_acquire_release() to cover both use cases?

I guess we haven't addressed Will's preference to use wfe for ARM
instead of doing a spin loop.  I'll like some suggestions on how to
proceed here.  Should we do arch_mcs_lock and arch_mcs_unlock, which
defaults to the existing mcs_lock and mcs_unlock code, but allow
architecture specific implementation?

Tim

> This could be used to further optimize circular buffers, for example.
> 
> 							Thanx, Paul
> 
> > Peter, what do you think about adding that to your patch?
> > 
> > -Longman
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>