GCC in-line assembly and the removal of -mcx16

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hej all.

In GCC 7.1.0, the x86_64 specific option "-mcx16" has been removed.

This means that the atomic intrinsics will no longer emit cmpxchg16b,
the double-word CAS instruction.

This means x86 will continue to support double-word CAS (it only needs
cmpxchg8b), but x86_64 will not.

As such, I need now on this platform to use in-line assembly.

I am not however well-informed about assembly.  I'm barely informed
about C :-)

This is the code I have;

result = 0;

__asm__ __volatile__
(
  "lock;"           /* make cmpxchg16b atomic        */
  "cmpxchg16b %0;"  /* cmpxchg16b sets ZF on success */
  "setz       %4;"  /* if ZF set, set result to 1    */

  /* output */
  : "+m" (pointer_to_destination[0]),
    "+m" (pointer_to_destination[1]),
    "+a" (pointer_to_compare[0]),
    "+d" (pointer_to_compare[1]),
    "=q" (result)

  /* input */
  : "b" (pointer_to_new_destination[0]),
    "c" (pointer_to_new_destination[1])

  /* clobbered */
  :
);

I have always found using inline-assembly in GCC difficult.  I don't
know enough to use it correctly.  I would appreciate any corrections or
advice with regard here of the GCC semantics for writing this code.

In particular I'm wondering if I should be marking "memory" as
clobbered, as cmpxchg16b will force a full memory barrier.

One final question relates to compiler barriers.

The more recent __atomic intrinsics do I believe always issue a compiler
barrier appropriate to the type of memory order used.

However, I think there are only three compiler barriers,
load/store/full, so there is not a set of compiler barriers matching the
memory orders.

I think the "__asm__ __volatile__" will inherently issue a full compiler
barrier.  Is this correct?

The older __sync instructions do not, as I understand it, issue a
compiler barrier, and the user must issue them.  Although I may be
wrong, prior to the __atomic API, I understand a compiler barrier
consists of;

__asm__ __volatile__ ( "" : : : "memory" );

A full compiler barrier prevents reordering across the barrier.
However, the compiler barrier itself is or appears to be a line of code.
 I seen them on the face of it to need to issue such a barrier both
immediately above and immediately below the __sync instruction.  Is this
correct?  I suspect I am here in some way very confused.




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux