Re: [PATCH v5 4/4] MCS Lock: Barrier corrections

Will Deacon <will.deacon@xxxxxxx> · Wed, 13 Nov 2013 17:37:57 +0000

On Tue, Nov 12, 2013 at 05:16:33PM +0000, George Spelvin wrote:
> > On Mon, Nov 11, 2013 at 09:17:52PM +0000, Tim Chen wrote:
> >> An alternate implementation is
> >> 	while (!ACCESS_ONCE(node->locked))
> >> 		arch_mutex_cpu_relax();
> >> 	smp_load_acquire(&node->locked);
> >> 
> >> Leaving the smp_load_acquire at the end to provide appropriate barrier.
> >> Will that be acceptable?
> 
> Will Deacon <will.deacon@xxxxxxx> wrote:
> > It still doesn't solve my problem though: I want a way to avoid that busy
> > loop by some architecture-specific manner. The arch_mutex_cpu_relax() hook
> > is a start, but there is no corresponding hook on the unlock side to issue a
> > wakeup. Given a sensible relax implementation, I don't have an issue with
> > putting a load-acquire in a loop, since it shouldn't be aggresively spinning
> > anymore.
> 
> So you want something like this?
> 
> /*
>  * This is a spin-wait with acquire semantics.  That is, accesses after
>  * this are not allowed to be reordered before the load that meets
>  * the specified condition.  This requires that it end with either a
>  * load-acquire or a full smp_mb().  The optimal way to do this is likely
>  * to be architecture-dependent.  E.g. x86 MONITOR/MWAIT instructions.
>  */
> #ifndef smp_load_acquire_until
> #define smp_load_acquire_until(addr, cond) \
> 	while (!(smp_load_acquire(addr) cond)) { \
> 		do { \
> 			arch_mutex_cpu_relax(); \
> 		} while (!(ACCESS_ONCE(*(addr)) cond)); \
> 	}
> #endif
> 
> 	smp_load_acquire_until(&node->locked, != 0);
> 
> Alternative implementations:
> 
> #define smp_load_acquire_until(addr, cond) { \
> 	while (!(ACCESS_ONCE(*(addr)) cond)) \
> 		arch_mutex_cpu_relax(); \
> 	smp_mb(); }
> 
> #define smp_load_acquire_until(addr, cond) \
> 	if (!(smp_load_acquire(addr) cond)) { \
> 		do { \
> 			arch_mutex_cpu_relax(); \
> 		} while (!(ACCESS_ONCE(*(addr)) cond)); \
> 		smp_mb(); \
> 	}

Not really...

To be clear: having the load-acquire in a loop is fine, provided that
arch_mutex_cpu_relax is something which causes the load to back-off (you
mentioned the MONITOR/MWAIT instructions on x86).

On ARM, our equivalent of those instructions also has a counterpart
instruction that needs to be executed by the CPU doing the unlock. That
means we can do one of two things:

	1. Add an arch hook in the unlock path to pair with the relax()
	   call on the lock path (arch_mutex_cpu_wake() ?).

	2. Move most of the code into arch_mcs_[un]lock, like we do for
	   spinlocks.

Whilst (1) would suffice, (2) would allow further optimisation on arm64,
where we can play tricks to avoid the explicit wakeup if we can control the
way in which the lock value is written.

Will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>