On Thu, 29 Jun 2017, Andrew Haley wrote: > Well, yeah. We can only really blame ARM for this: they provided a > double-word CAS but no way to define a double-word atomic load which > does not also store. I hesitate to place blame on the ARM architects, > a splendid and diligent bunch, but there it is. I have no idea why > LDXP doesn't work as an atomic load, but it does not. FWIW, the situation is similar on amd64: there's no way to perform atomic double-word load in readonly manner, for 128-bit SSE loads is it explicitly documented that their atomicity is _not_ guaranteed. In principle there's a nice solution involving vdso and kernel assists: - have __atomic_load_128 be implemented with a new syscall that is supposed to be handled via vdso, - use double-word ll/sc or cas in the vdso to implement the load, - if it doesn't trap, yay! success, - if it traps, the kernel can easily see if the trap was from the vdso, and can emulate the load, with either a plain load from readonly memory, or redoing the cas on writable memory with kernel priviledges; the only case it can't naturally proceed if it's volatile device memory I think? (but of course that's a nontrivial amount of work for a somewhat "academic" issue) Alexander