Hej, all. I would like to write a little about the new libatomic mechanism for double-word CAS on 64-bit platforms, since I've ended up not using it and I suspect that was not what the developers would have wished for their hard work. Back in early 2005 AMD released some of its early 64-bit processors lacking supporting for double-word CAS. GCC as I understand it and laudably looks to support a wide range of platforms and as such introduced the -mcx16 switch, so the user could indicate the presence of double-word CAS support on x86_64. So way atomics worked was that GCC provided the __sync and then later the __atomic intrinsic APIs, wholly independently, and always actually emitted inline instructions. Fast forward to today and the release of 7.1.0 and here we see the mechanism for supporting these platforms has changed. There -mcx16 switch is not used in the new mechanism; rather, GCC now depends on libatomic, libatomic depends on ifunc support, GCC calls libatomic for double-word CAS and libatomic is designed to select the best implementation of double-word CAS available on the current platform. On the face of it, this seems entirely reasonable; generic (not just for x86_64), more flexible and less intrusive (no need for a special switch) and provides alternative mechanisms on platforms lacking double-word CAS. Code can continue to compile, link and run. I observe however there are some costs. 1. GCC now depends upon libatomic. My code base consists of data structures only and it targets a bare C compiler (not just freestanding - bare). I think libatomic is probably like libgcc - it should be considered part of the bare compiler - but libgcc I suspect really is available everywhere you get GCC, but I have no idea if this is really the case yet for libatomic, or if it might be available but partially implemented. (The unclosed bug about libatomic not initializing its ifuncs correctly on static builds is on my mind.) 2. libatomic seems to depend upon ifuncs. I do not know how widely supported ifuncs are. If someone takes my code on their odd little 17 bit dishwasher, with their port of GCC which only has support for static linking, will they have ifuncs? 3. libatomic silently substitutes alternatives. For my use case, only a lock-free instruction can be used. There are no alternatives. If a lock-free instruction is not available, the platform does not in fact support my code. If the code compiles and links when it should not, the user is misled. He may even unwittingly use the code, given that the test suite will in fact pass and the benchmark is harder to port and so may not be ported. It is not clear to me how I can tell if libatomic is using a lock-free instruction. I can check perhaps in my own builds, by inspecting the assembly, but what about end-user builds? 4. A library call is now being made every time a double word CAS is used. This cost is small (one jump instruction I believe, after the initial lookup work), but it is *directly* opposing a primary design goal (performance) and so is as costly as it is able to be. The code base elsewhere is carefully designed to avoid overheads, the only other function call being the one into the data structure API itself. What I would observe then about these costs is that GCC's view is longer and wider than mine as a *user* of GCC. It is a sensible cost/benefit trade-off for GCC, but not for me, as a user of GCC. I am by being wholly unimportant able to dismiss these early AMD processors, because so few people use my code no-one will have them. As such, I bear only the costs, and none of the benefits (and, indeed, one of the benefits - alternatives - is a serious cost). I've moved from a simple and thoroughly understood situation to a complex and poorly understood situation. What I see however is that there is a way for me to avoid these costs and return to the simple situation. My code has an abstraction layer, and I can implement inline assembly for double-word CAS on 64-bit platforms and use that instead of __atomic and __sync. This only has to be done for x86_64 (very simple) and aarch64 (complex, alas, so it goes), since they are the only platforms to offer this. It is not needed for 32 bit platforms since GCC does not use libatomic to support double word CAS on these platforms. I may be wrong, but I think it is clear to the reader that this is the sensible choice for me. It's a small amount of work, with testable code, rather than the vague and ongoing task of keeping informed about the state of libatomic implementation, platform support and bugs, and of course the possibly unsolvable problem of knowing on an end-user build whether or not libatomic is emitting a lock-free instruction or something else. Of course, I have to keep informed about the state of the GCC implementation of __sync and __atomic, but they are less complex, since they have no external dependency and if they are unsupported I know it, as they offer no alternatives. I can continue to use the knowledge I have built up about these APIs and not have to build up a second set of knowledge in parallel about libatomic.