On Wed, 23 Oct 2024 at 12:45, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > Do we want to do the complementing patch and make write_seqcount_end() > use smp_store_release() ? > > I think at least ARM (the 32bit thing) has wmb but uses mb for > store_release. But I also think I don't really care about that. So unlike the "acquire vs rmb", there are architectures where "wmb" is noticeably cheaper than a "store release". Just as an example, on alpha, a "store release" is a full memory barrier followed by the store, because it needs to serialize previous loads too. But wmp_wmb() is lightweight. Typically in traditional (pre acquire/release) architectures "wmb" only ordered the CPU write queues, so "wmb" has always been cheap pretty much everywhere. And I *suspect* that alpha isn't the outlier in having a much cheaper wmb than store-release. But yeah, it's kind of ugly how we now have three completely different orderings for seqcounts: - the initial load is done with the smp_read_acquire - the final load (the "retry") is done with a smp_rmb (because an acquire orders _subsequent_ loads, not the ones inside the lock: we'd actually want a "smp_load_release()", but such a thing doesn't exist) - the writer side uses smp_wmb (and arguably there's a fourth pattern: the latching cases uses double smp_wmb, because it orders the sequence count wrt both preceding and subsequent stores) Anyway, obviously on x86 (and s390) none of this matters. On arm64, I _suspect_ they are mostly the same, but it's going to be very microarchitecture-dependent. Neither should be expensive, but wmb really is a fundamentally lightweight operation. On 32-bit arm, wmb should be cheaper ("ishst" only waits for earlier stores). On powerpc, wmb is cheaper on older CPU's (eieio vs sync), but the same on newer CPUs (lwsync). On alpha, wmb is definitely cheaper, but I doubt anybody really cares. Others? I stopped looking, and am not familiar enough. Linus