On 04/05/17 20:04, Andrew Haley wrote: > On 04/05/17 16:52, Toebs Douglass wrote: >> On 04/05/17 16:21, Andrew Haley wrote: >>> Either works. The mappings from C++ atomics to processors are here: >>> >>> https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html >> >> Ah, that is interesting, and makes a lot of sense. >> >> For SC, it's atomic. For everything else, not - which means for >> everything else, although ordering is of course guaranteed, visibility >> is not, and we rely on the processor doing something "in a reasonable >> time" (which might for example be long enough that things break). > > Umm, what? All access modes are atomic. We have to be careful here, because we may have different ideas about what atomic means. I may be completely wrong, but I think I understand memory barriers and atomic operations, but I think I do not always use formal terms exactly as they are used in the field. So, here, I am not sure what you mean by atomic. I do think though that everything other than SC is not atomic (as I use the word). This means, for example, that the store may *never* be seen by any other core. In practise I'm sure this doesn't happen, but there are no guarantees, and I think for code to be always correct, the assumption has to be made that it is so. >>>> I've just had a bit of an online search, as best I could, through the >>>> GCC source code. It looks like expand_atomic_store() does use an atomic >>>> exchange or atomic CAS. >>> >>> That depends on your machine. On mine (ARMv8) a seq.cst store uses stlr. >> >> I'm surprised. I would expect that to be able to fail (because of the >> "reasonable time"). I don't know much about ARM though (or about Intel, >> for that matter :-) > > Eventually the processor will be pre-empted for some reason or the > cache line which contains the store will be flushed because of another > access, but it could be a long wait. I've seen delays of thousands > of instructions, but it could be longer. This - cache line flushing - is not the issue I have in mind, for to have reached a cache, the data in question will then be participating in the MESI protocol, and so it will be visible to other processors. The problem I have in mind is store buffers and that store barriers do not cause stores to complete. The processor performing the store will think it has issued the store and see the world accordingly *prior even to the store reaching the first level cache*, and there is no guarantee about how long this state of affairs persists. So if we perform a store and then a store barrier, we have nothing - there is no guarantee any other core has seen this store or ever will. We only have a guarantee that IF a store *after* the store barrier is going to complete, all stores prior to the barrier will be forced to actually complete, so that they complete first (and thus honour the store barrier). In other words, all stores which do not use LL/SC or LOCK (or equivalent thereof) can in effect never occur. They're just regular stores, with ordering constraints provided by memory barriers - I don't think of them as atomic. Atomic to me means the store will be forced to complete.