On 07/05/17 11:23, Andrew Haley wrote: > On 05/05/17 19:00, Toebs Douglass wrote: >> Because the CAS forces a store, the stores earlier than the store >> barrier MUST now complete - or we violate the constraint imposed by the >> store barrier. > > This is hard work, but I have to repeat it because others may be > misled. CAS does not "force a store", whatever that means. CAS does > not do anything that is different (WRT visibility) from any other > store. I may *well* be wrong, as I'm about as far from knowledgeable about processor internals as you can get, but I think this is not so. A normal store will pass through the store buffer mechanism. A CAS store (in fact LL/SC / LOCK stores in general) will not - either it will directly bypass the store buffer mechanism, or it will effectively bypass it, by locking it and then flushing it (I understand Intel and SPARC both take the latter approach). > Paul McKenney explains all this stuff perfectly weill in > "Memory Barriers: a Hardware View for Software Hackers," including the > key point about a full barrier needing to flush the store buffer. I've had a bit of a search in "Hardware View" and unfortunately I can't find an equivalent of the lovely very direct statement from the document "Linux Kernel Memory Barriers", which was co-authored by Paul. The quote from that latter document is this; "There is no guarantee that any of the memory accesses specified before a memory barrier will be _complete_ by the completion of a memory barrier instruction; the barrier can be considered to draw a line in that CPU's access queue that accesses of the appropriate type may not cross." I may be wrong, but I understand this to mean that if you issue a store, and then a store barrier, the store barrier does not cause the earlier store to complete. It only ensures that when a store after the barrier *does* complete, that all stores prior the barrier will have completed first. In "Hardware View", Paul retreats to the conventional description of store barriers; "Similarly, a write memory barrier orders only stores, again on the CPU that executes it, and again so that all stores preceding the write memory barrier will appear to have completed before any store following the write memory barrier." This describes the ordering property, but it doesn't also go on to explicitly state the barrier has no effect on completion (which is to say, reaching a point where the store becomes visible to the MESI protocol). As I say, I could not find such a direct quote in the document (although I have not looked as hard as I possibly could - in the middle of some coding). I have discussed the use of CAS (or other atomic operations, such as exchange) to force a completed store (and so to force honouring of prior store barriers) with Paul in email, and although of course it's entirely possible there was misunderstanding, he seemed in agreement. However, I don't want to put words in his mouth, and it is entirely possible I am confused in this matter, so I don't want to put too much weight on this point.