On 03/03/2020 17:14, Jonathan Wakely wrote:
On Tue, 3 Mar 2020 at 17:11, Chris Hall <gcc@xxxxxxx> wrote:
...
So given:
_Atomic(uint64_t*) foo ;
uint64_t* bar ;
bar = atomic_fetch_add(&foo, 1) ;
why do gcc 9.2/glibc 2.30 add 1 and not 8 to the address ?
That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64843
Ah. Opened 28-Jan-2015, so only 5 years old.
As noted a few days ago, the Standard requires that `atomic_xxx()`
operations take `_Atomic(foo_t)*` arguments, and I believe that passing
a `uint64_t*` is an error. But gcc (at least on x86_64) does not. I
believe these bugs are all related to <stdatomic.h> mapping the standard
`atomic_xxx()` to the non-standard `__atomic_xxx()` builtins.
Given the "ambiguity" in the standard, I can imagine there is little
incentive to fix this... And, I doubt gcc users would be happy with
their applications suddenly doing something different or failing to
compile, even when what they are doing is manifestly Not-Per-the-Standard.
(I note the Clang folks seem to have opted to offer a choice of "legacy"
and "(more) standard compliant" versions.)
Similarly, I imagine that the Standard folks gain *nothing* by "fixing"
the text if that (a) pushes existing, established implementations out of
spec., or (b) potentially introduces interesting new ambiguities.
Using atomics correctly is hard. The main reason for expending the
effort is to implement "wait-free" operations -- where no thread can be
held up (for any significant time) by any other thread. (For the
avoidance of doubt: this generally means that no thread will be made to
wait for another thread which is not currently running.)
Generally, a "wait-free" atomic operation is one which reduces to some
hardware primitive, where any lock required is automatically released if
execution of the thread is interrupted (in the hardware sense). But
other ways of achieving (adequate) "wait-free" properties may also
implementable.
At the C language level it makes perfect sense to define _Atomic()
objects quite generally and to do so such that implementations are not
(unduly) constrained. The Standard very nearly does that, but comes
unstuck in <stdatomic.h> where the general notion of an _Atomic(struct
foo) collides with the more specific support for simple _Atomic integers
-- where the latter may (well) be supported in hardware.
But at practical level, my guess is that any serious use of atomic
operations is limited to the "wait-free" ones. In effect, the (only)
really useful operations are all Implementation Defined.
So it doesn't much matter that the gcc <stdatomic.h> isn't compliant.
What matters is that the programmer can use whatever is supported by the
x86_64, the ARM, the POWER PC or whatever machine they are writing for.
And that is going to be operations on straightforward machine uintXX_t
(perhaps with strict alignment requirements)... and not some exotic
_Atomic(uintXX_t) with a different size and/or representation and/or
alignment !
And the most practical thing to do is for gcc (and others) to retain
compatibility with their long established bugs, and for the C Standards
folk to concentrate their limited resources on things which matter.
But that leaves the programmer in the land of you-know-and-I-know, and
having to assume things about current and future implementations.
IMO, what might help here is something akin to the 'lock-free'
compile-time and run-time macros/functions, so that the programmer can
establish what a given implementation does or does not provide. In
particular:
* what integers, pointers etc. can be directly operated on atomically?
ie, the types that do *not* have a distinct (size and/or
representation) _Atomic(xxx) qualified type.
This is slightly complicated by the ability of some CPUs to do
cmp/xchg for things bigger than your usual uintmax_t.
Perhaps for these purposes the model should be based on the byte
size of the units which can be operated on atomically (for load,
store, xchg, cmp/xchg, op=, etc.) ie, much like the __atomic_xxx
builtins, unsurprisingly.
* whether there are any special alignment requirements for the above.
* which operations are indeed "wait-free" (for some value thereof)
I am told that "lock-free" may or may not mean this.
In essence, I think the Standard needs to reflect the fact that most
practical use (at least currently) requires a great deal which is
Implementation Defined, and the most useful thing the Standard can do is
to carefully specify that -- so that the programmer can discover what
they need to know, in the same way across implementations.
Perhaps the implementers can help move the Standard in the right direction ?
Chris