Re: [PATCH v6 4/5] MCS Lock: Barrier corrections

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Fri, 22 Nov 2013 10:49:37 -0800

On Fri, Nov 22, 2013 at 04:16:00PM +0100, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> 
> > On Thu, Nov 21, 2013 at 08:25:59PM -0800, Linus Torvalds wrote:
> >
> > [...]
> > 
> > > I do care deeply about reality, particularly of architectures that 
> > > actually matter. To me, a spinlock in some theoretical case is 
> > > uninteresting, but a efficient spinlock implementation on a real 
> > > architecture is a big deal that matters a lot.
> > 
> > Agreed, reality and efficiency are the prime concerns.  Theory 
> > serves reality and efficiency, but definitely not the other way 
> > around.
> > 
> > But if we want locking primitives that don't rely solely on atomic 
> > instructions (such as the queued locks that people have been putting 
> > forward), we are going to need to wade through a fair bit of theory 
> > to make sure that they actually work on real hardware.  Subtle bugs 
> > in locking primitives is a type of reality that I think we can both 
> > agree that we should avoid.
> > 
> > Or am I missing your point?
> 
> I think one point Linus wanted to make that it's not true that Linux 
> has to offer a barrier and locking model that panders to the weakest 
> (and craziest!) memory ordering model amongst all the possible Linux 
> platforms - theoretical or real metal.
> 
> Instead what we want to do is to consciously, intelligently _pick_ a 
> sane, maintainable memory model and offer primitives for that - at 
> least as far as generic code is concerned. Each architecture can map 
> those primitives to the best of its abilities.
> 
> Because as we increase abstraction, as we allow more and more complex 
> memory ordering details, so does maintainability and robustness 
> decrease. So there's a very real crossover point at which point 
> increased smarts will actually hurt our code in real life.
> 
> [ Same goes for compilers, we draw a line: for example we generally
>   turn off strict aliasing optimizations, or we turn off NULL pointer
>   check elimination optimizations. ]
> 
> I'm not saying this to not discuss theoretical complexities - I'm just 
> saying that the craziest memory ordering complexities are probably 
> best dealt with by agreeing not to use them ;-)

Thank you for the explanation, Ingo!  I do agree with these principles.

That said, I remain really confused.  My best guess is that you are
advising me to ask Peter to stiffen up smp_store_release() so that
it preserves the guarantee that unlock+lock provides a full barrier,
thus allowing it to be used in the queued spinlocks as well as in its
original circular-buffer use case.  But even that doesn't completely
fit because that was the direction I was going beforehand.

You see, my problem is not the "crazy ordering" DEC Alpha, Itanium,
PowerPC, or even ARM.  It is really obvious what instructions to use in
a stiffened-up smp_store_release() for those guys: "mb" for DEC Alpha,
"st.rel" for Itanium, "sync" for PowerPC, and "dmb" for ARM.  Believe it
or not, my problem is instead with good old tightly ordered x86.

We -could- just put an mfence into x86's smp_store_release() and
be done with it, but it currently looks like we get the effect of
a full memory barrier without it, at least in the special case of
the high-contention queued-lock handoff.  To repeat, it looks like we
preserve the full-memory-barrier property of unlock+lock for x86 -even-
-though- the queued-lock high-contention handoff code contains neither
atomic instructions nor memory-barrier instructions.  This is a bit
surprising to me, to say the least.  Hence my digging into the theory
to check it -- after all, we cannot prove it correct by testing it.

Here are some other things that you and Linus might be trying to tell me:

o	Just say "no" to queued locks.  (I am OK with this.  NAKs are
	after all easier than beating my head against memory models.)

o	Don't add store-after-conditional control dependencies to
	Documentation/memory-barriers.txt because it is too complicated.
	(I am OK with this, I suppose -- but some people really want to
	rely on them.)

o	Just add general control dependencies, because that is what
	people expect.	(I have more trouble with this because there
	is a -lot- of work needed in many projects to make this happen,
	including on ARM, but some work on x86 as well.)

Anything I am missing here?

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>