On Thu, 8 Aug 2024 at 06:20, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > But then multiple times people brought up that supposedly smp_rmb() and > smp_wmb() are cheaper because they only do load or store ordering > whereas smp_{load,store}_{acquire,release}() do load and store ordering. It really can go either way. But I think we've reached a point where release/acquire is "typically cheaper", and the reason is simply arm64. As mentioned, on x86 none of this matters. And on older architectures that were designed around the concept of separate memory barriers, the rmb/wmb model thus matches that architecture model and tends to be natural and likely the best impedance match. But the arm64 memory ordering was created after people had figured out the rules of good memory ordering, and so we have this: https://developer.arm.com/documentation/102336/0100/Load-Acquire-and-Store-Release-instructions and this particular quote: "Weaker ordering requirements that are imposed by Load-Acquire and Store-Release instructions allow for micro-architectural optimizations, which could reduce some of the performance impacts that are otherwise imposed by an explicit memory barrier. If the ordering requirement is satisfied using either a Load-Acquire or Store-Release, then it would be preferable to use these instructions instead of a DMB" iow we now have a relevant architecture that gets memory ordering right, and that officially prefers release/acquire ordering. End result: we *used* to prefer rmb/wmb pairs, because (a) it was how we did memory ordering originally, (b) relevant architectures didn't care, and (c) it matched the questionable architectures. And now, in the last few years, the equation has simply shifted. So rmb/wmb has gone from "this is the only way to do it" to "this is the legacy way to do it and it performs ok everywhere" to "this is the historical way that some people are more used to". For new code, release/acquire is preferred. And if it's *critical* code, maybe it's even worth converting from wmb/rmb to release/acquire. Partly because of that "it should be better on arm64", but also partly because I think release/acquire is both a better model conceptually, _and_ is more self-documenting (ie it's a nice explicit hand-off in ways that some of our subtler "this wmb pairs with that rmb" code is very much not at all self-documenting and needs very explicit and clear comments). Now, I'm not saying you shouldn't add a comment about a release/acquire pair, but at the same time, the very fact that you release a _particular_ variable and acquire that variable elsewhere *is* a big clue. So when I'm saying it's "more self-documenting", I want to emphasize that "more". I'm not claiming it's _completely_ self-documenting ;) Linus