Re: Are atomic_fetch_xxx() functions broken for atomic-pointer types ?

Chris Hall <gcc@xxxxxxx> · Thu, 5 Mar 2020 16:00:50 +0000

On 03/03/2020 17:14, Jonathan Wakely wrote:
On Tue, 3 Mar 2020 at 17:11, Chris Hall <gcc@xxxxxxx> wrote:
...
So given:

    _Atomic(uint64_t*) foo ;
    uint64_t* bar ;

    bar = atomic_fetch_add(&foo, 1) ;

why do gcc 9.2/glibc 2.30 add 1 and not 8 to the address ?

That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64843

Ah.  Opened 28-Jan-2015, so only 5 years old.

As noted a few days ago, the Standard requires that `atomic_xxx()` 
operations take `_Atomic(foo_t)*` arguments, and I believe that passing 
a `uint64_t*` is an error.  But gcc (at least on x86_64) does not.  I 
believe these bugs are all related to <stdatomic.h> mapping the standard 
`atomic_xxx()` to the non-standard `__atomic_xxx()` builtins.

Given the "ambiguity" in the standard, I can imagine there is little 
incentive to fix this...  And, I doubt gcc users would be happy with 
their applications suddenly doing something different or failing to 
compile, even when what they are doing is manifestly Not-Per-the-Standard.

(I note the Clang folks seem to have opted to offer a choice of "legacy" 
and "(more) standard compliant" versions.)

Similarly, I imagine that the Standard folks gain *nothing* by "fixing" 
the text if that (a) pushes existing, established implementations out of 
spec., or (b) potentially introduces interesting new ambiguities.

Using atomics correctly is hard.  The main reason for expending the 
effort is to implement "wait-free" operations -- where no thread can be 
held up (for any significant time) by any other thread.  (For the 
avoidance of doubt: this generally means that no thread will be made to 
wait for another thread which is not currently running.)

Generally, a "wait-free" atomic operation is one which reduces to some 
hardware primitive, where any lock required is automatically released if 
execution of the thread is interrupted (in the hardware sense).  But 
other ways of achieving (adequate) "wait-free" properties may also 
implementable.

At the C language level it makes perfect sense to define _Atomic() 
objects quite generally and to do so such that implementations are not 
(unduly) constrained.  The Standard very nearly does that, but comes 
unstuck in <stdatomic.h> where the general notion of an _Atomic(struct 
foo) collides with the more specific support for simple _Atomic integers 
-- where the latter may (well) be supported in hardware.

But at practical level, my guess is that any serious use of atomic 
operations is limited to the "wait-free" ones.  In effect, the (only) 
really useful operations are all Implementation Defined.

So it doesn't much matter that the gcc <stdatomic.h> isn't compliant. 
What matters is that the programmer can use whatever is supported by the 
x86_64, the ARM, the POWER PC or whatever machine they are writing for. 
And that is going to be operations on straightforward machine uintXX_t 
(perhaps with strict alignment requirements)... and not some exotic 
_Atomic(uintXX_t) with a different size and/or representation and/or 
alignment !

And the most practical thing to do is for gcc (and others) to retain 
compatibility with their long established bugs, and for the C Standards 
folk to concentrate their limited resources on things which matter.

But that leaves the programmer in the land of you-know-and-I-know, and 
having to assume things about current and future implementations.

IMO, what might help here is something akin to the 'lock-free' 
compile-time and run-time macros/functions, so that the programmer can 
establish what a given implementation does or does not provide.  In 
particular:

  * what integers, pointers etc. can be directly operated on atomically?

    ie, the types that do *not* have a distinct (size and/or
        representation) _Atomic(xxx) qualified type.

    This is slightly complicated by the ability of some CPUs to do
    cmp/xchg for things bigger than your usual uintmax_t.

    Perhaps for these purposes the model should be based on the byte
    size of the units which can be operated on atomically (for load,
    store, xchg, cmp/xchg, op=, etc.)  ie, much like the __atomic_xxx
    builtins, unsurprisingly.

  * whether there are any special alignment requirements for the above.

  * which operations are indeed "wait-free" (for some value thereof)

    I am told that "lock-free" may or may not mean this.

In essence, I think the Standard needs to reflect the fact that most 
practical use (at least currently) requires a great deal which is 
Implementation Defined, and the most useful thing the Standard can do is 
to carefully specify that -- so that the programmer can discover what 
they need to know, in the same way across implementations.

Perhaps the implementers can help move the Standard in the right direction ?

Chris