On Wed, Oct 23, 2024 at 01:34:16PM -0700, Linus Torvalds wrote: > On Wed, 23 Oct 2024 at 12:45, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > Do we want to do the complementing patch and make write_seqcount_end() > > use smp_store_release() ? > > > > I think at least ARM (the 32bit thing) has wmb but uses mb for > > store_release. But I also think I don't really care about that. > > So unlike the "acquire vs rmb", there are architectures where "wmb" is > noticeably cheaper than a "store release". > > Just as an example, on alpha, a "store release" is a full memory > barrier followed by the store, because it needs to serialize previous > loads too. But wmp_wmb() is lightweight. > > Typically in traditional (pre acquire/release) architectures "wmb" > only ordered the CPU write queues, so "wmb" has always been cheap > pretty much everywhere. > > And I *suspect* that alpha isn't the outlier in having a much cheaper > wmb than store-release. > > But yeah, it's kind of ugly how we now have three completely different > orderings for seqcounts: > > - the initial load is done with the smp_read_acquire > > - the final load (the "retry") is done with a smp_rmb (because an > acquire orders _subsequent_ loads, not the ones inside the lock: we'd > actually want a "smp_load_release()", but such a thing doesn't exist) > > - the writer side uses smp_wmb > > (and arguably there's a fourth pattern: the latching cases uses double > smp_wmb, because it orders the sequence count wrt both preceding and > subsequent stores) > > Anyway, obviously on x86 (and s390) none of this matters. > > On arm64, I _suspect_ they are mostly the same, but it's going to be > very microarchitecture-dependent. Neither should be expensive, but wmb > really is a fundamentally lightweight operation. I agree here. An STLR additionally orders PO-prior loads on arm64, so I'd stick with the wmb(). Will