Re: Memory model release/acquire mode interactions of relaxed atomic operations

Toebs Douglass <toby@xxxxxxxxxxxxxx> · Sun, 7 May 2017 17:41:44 +0200

On 07/05/17 11:23, Andrew Haley wrote:
> On 05/05/17 19:00, Toebs Douglass wrote:

>> Because the CAS forces a store, the stores earlier than the store
>> barrier MUST now complete - or we violate the constraint imposed by the
>> store barrier.
> 
> This is hard work, but I have to repeat it because others may be
> misled.  CAS does not "force a store", whatever that means.  CAS does
> not do anything that is different (WRT visibility) from any other
> store.

I may *well* be wrong, as I'm about as far from knowledgeable about
processor internals as you can get, but I think this is not so.  A
normal store will pass through the store buffer mechanism.  A CAS store
(in fact LL/SC / LOCK stores in general) will not - either it will
directly bypass the store buffer mechanism, or it will effectively
bypass it, by locking it and then flushing it (I understand Intel and
SPARC both take the latter approach).

> Paul McKenney explains all this stuff perfectly weill in
> "Memory Barriers: a Hardware View for Software Hackers," including the
> key point about a full barrier needing to flush the store buffer.

I've had a bit of a search in "Hardware View" and unfortunately I can't
find an equivalent of the lovely very direct statement from the document
"Linux Kernel Memory Barriers", which was co-authored by Paul.

The quote from that latter document is this;

"There is no guarantee that any of the memory accesses specified before
a memory barrier will be _complete_ by the completion of a memory
barrier instruction; the barrier can be considered to draw a line in
that CPU's access queue that accesses of the appropriate type may not
cross."

I may be wrong, but I understand this to mean that if you issue a store,
and then a store barrier, the store barrier does not cause the earlier
store to complete.  It only ensures that when a store after the barrier
*does* complete, that all stores prior the barrier will have completed
first.

In "Hardware View", Paul retreats to the conventional description of
store barriers;

"Similarly, a write memory barrier orders only stores, again on the CPU
that executes it, and again so that all stores preceding the write
memory barrier will appear to have completed before any store following
the write memory barrier."

This describes the ordering property, but it doesn't also go on to
explicitly state the barrier has no effect on completion (which is to
say, reaching a point where the store becomes visible to the MESI
protocol).  As I say, I could not find such a direct quote in the
document (although I have not looked as hard as I possibly could - in
the middle of some coding).

I have discussed the use of CAS (or other atomic operations, such as
exchange) to force a completed store (and so to force honouring of prior
store barriers) with Paul in email, and although of course it's entirely
possible there was misunderstanding, he seemed in agreement.  However, I
don't want to put words in his mouth, and it is entirely possible I am
confused in this matter, so I don't want to put too much weight on this
point.