Re: [PATCH v5 4/4] MCS Lock: Barrier corrections

Will Deacon <will.deacon@xxxxxxx> · Tue, 12 Nov 2013 16:08:27 +0000

On Mon, Nov 11, 2013 at 09:17:52PM +0000, Tim Chen wrote:
> On Mon, 2013-11-11 at 18:10 +0000, Will Deacon wrote:
> > On Fri, Nov 08, 2013 at 07:52:38PM +0000, Tim Chen wrote:
> > > diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c
> > > index b6f27f8..df5c167 100644
> > > --- a/kernel/locking/mcs_spinlock.c
> > > +++ b/kernel/locking/mcs_spinlock.c
> > > @@ -23,6 +23,31 @@
> > >  #endif
> > >  
> > >  /*
> > > + * Fall back to use the regular atomic operations and memory barrier if
> > > + * the acquire/release versions are not defined.
> > > + */
> > > +#ifndef	xchg_acquire
> > > +# define xchg_acquire(p, v)		xchg(p, v)
> > > +#endif
> > > +
> > > +#ifndef	smp_load_acquire
> > > +# define smp_load_acquire(p)				\
> > > +	({						\
> > > +		typeof(*p) __v = ACCESS_ONCE(*(p));	\
> > > +		smp_mb();				\
> > > +		__v;					\
> > > +	})
> > > +#endif
> > > +
> > > +#ifndef smp_store_release
> > > +# define smp_store_release(p, v)		\
> > > +	do {					\
> > > +		smp_mb();			\
> > > +		ACCESS_ONCE(*(p)) = v;		\
> > > +	} while (0)
> > > +#endif
> > 
> > PeterZ already has a series implementing acquire/release accessors, so you
> > should probably take a look at that rather than rolling your own here.
> 
> Yes, we are using Peter Z's implementation here.  The above is for anything
> where smp_load_acquire and smp_store_release are *not* defined.  We can
> remove this once all architectures implement the acquire and release 
> functions as mentioned in the comments of the patch.

Right, so you can use barrier.h and asm-generic will define generic versions
(identical to the above) for you if the architecture doesn't have an
optimised variant. You don't need to reproduce that in your .c file.

> > > +	/*
> > > +	 * Wait until the lock holder passes the lock down.
> > > +	 * Using smp_load_acquire() provides a memory barrier that
> > > +	 * ensures subsequent operations happen after the lock is acquired.
> > > +	 */
> > > +	while (!(smp_load_acquire(&node->locked)))
> > >  		arch_mutex_cpu_relax();
> > 
> > After a chat with some micro-architects, I'm going to have to disagree with
> > Paul here. For architectures where acquire/release are implemented with
> > explicit barriers (similarly for simple microarchitectures), emitting
> > barriers in a loop *is* going to have an affect on overall performance,
> > since those barriers may well result in traffic outside of the core (at
> > least, on ARM).
> > 
> > Thinking more about that, the real issue here is that arch_mutex_cpu_relax()
> > doesn't have a corresponding hook on the unlock side. On ARM, for example,
> > we can enter a low-power state using the wfe instruction, but that requires
> > the unlocker to wake up the core when the lock is released.
> 
> An alternate implementation is
> 	while (!ACCESS_ONCE(node->locked))
> 		arch_mutex_cpu_relax();
> 	smp_load_acquire(&node->locked);
> 
> Leaving the smp_load_acquire at the end to provide appropriate barrier.
> Will that be acceptable?

It still doesn't solve my problem though: I want a way to avoid that busy
loop by some architecture-specific manner. The arch_mutex_cpu_relax() hook
is a start, but there is no corresponding hook on the unlock side to issue a
wakeup. Given a sensible relax implementation, I don't have an issue with
putting a load-acquire in a loop, since it shouldn't be aggresively spinning
anymore.

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>