On 10/10/2020 14:39, Jonathan Wakely wrote: > On Fri, 9 Oct 2020 at 19:29, David Brown <david.brown@xxxxxxxxxxxx> wrote: >> >> I don't know if this can be answered here, or would be best on the >> development mailing list. But I'll start on the help list. >> >> I work primarily with microcontrollers, with 32-bit ARM Cortex-M devices >> being the most common these days. I've been trying out atomics in gcc, >> and I find it badly lacking. (I've tried C11 <stdatomic.h>, C++11 >> <atomic>, and the gcc builtins - they all generate the same results, >> which is to be expected.) I'm concentrating on plain loads and stores >> at the moment, not other atomic operations. >> >> These microcontrollers are all single core, so memory ordering does not >> matter. >> >> For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple >> loads and stores. These are generated fine. >> >> But for 64-bit and above, there are library calls to a compiler-provided >> library. For the Cortex M4 and M7 cores (and several other Cortex M >> cores), the "load double register" and "store double register" >> instructions are atomic (but not suitable for use with volatile data, >> since they are restarted if they are interrupted). The compiler >> generates these for normal 64-bit types, but not for atomics. >> >> For larger types, the situation is far, far worse. Not only is the >> library code inefficient on these devices (disabling and re-enabling >> global interrupts is the optimal solution in most cases, with load/store >> with reservation being a second option), but it is /wrong/. The library >> uses spin locks (AFAICS) - on a single core system, that generally means >> deadlocking the processor. That is worse than useless. >> >> Is there any way I can replace this library with my own code here, while >> still using the language atomics? > > Yes. My understanding is that libatomic is designed to be replaceable > by users who want to provide their own custom implementations of the > API. > > You're using bare metal ARM, right? For Arm on Linux I think there are > kernel helpers that make the atomics efficient even when the hardware > doesn't support them. > Yes, I am using bare metal (well, sometimes an RTOS - but that's still a lot closer to bare metal than to a host OS like Linux). And I have a single core - that makes atomics easier because I don't even need "dmb" or other memory barrier instructions, and I can freely use "disable interrupts around the access" strategy. On the other hand, it means that the spin locks in libatomic are completely wrong. If I understand you correctly, you mean that I can simply implement my own version of __atomic_load_8 and other functions in libatomic? I had a quick test (using the godbolt.org online compiler). By adding this to my file: extern inline uint64_t __atomic_load_8(const volatile void * p, int order) { (void) order; const volatile uint64_t * q = (const volatile uint64_t *) p; return *q; } then a straight load of a 64-bit atomic becomes a single "ldrd" load double register instruction, which is optimal for this processor. (In a finished solution, I'd want to check that this is correct for different flags - possibly adding function attributes for optimisation or inline assembly to ensure that it is always correct. But that's a detail for me to check.) The same worked for __atomic_store_8. (The general load/store functions are a bit more involved, as are the read-modify-write atomic functions.) Is this strategy guaranteed to work in gcc, or is it a case of "it works in a simple test, but might fail in a complicated program or with different flags" ?