On Thu, Jul 29, 2021 at 10:55:52AM +0100, Will Deacon wrote: > Overall, I'm not thrilled to bits by extending the atomics API with > operations that cannot be implemented efficiently on any (?) architectures > and are only used by the qspinlock slowpath on machines with more than 16K > CPUs. My rationale for proposing this primitive is similar to the existence of other composite atomic ops from the Misc (and refcount) class (as per atomic_t.txt). They're common/performance sensitive operations that, on LL/SC platforms, can be better implemented than a cmpxchg() loop. Specifically here, it can be used to implement short xchg() in an architecturally neutral way, but more importantly it provides fwd progress on LL/SC, while most LL/SC based cmpxchg() implementations are arguably broken there. People seem to really struggle to implement that sanely. It's such a shame we can't have the compiler generate sane composite atomics for us.. > I also think we're lacking documentation justifying when you would use this > new primitive over e.g. a sub-word WRITE_ONCE() on architectures that > support those, especially for the non-returning variants. Given the sub-word ordering 'fun', this might come in handy somewhere :-) But yes, it's existence is more of a completeness/symmetry argument than anything else.