On Thu, 2 May 2024 at 16:12, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > One of RCU's state machines uses smp_store_release() to start the > state machine (only one task gets to do this) and cmpxchg() to update > state beyond that point. And the state is 8 bits so that it and other > state fits into 32 bits to allow a single check for multiple conditions > elsewhere. Note that since alpha lacks the release-acquire model, it's always going to be a full memory barrier before the store. And then the store turns into a load-mask-store for older alphas. So it's going to be a complete mess from a performance standpoint regardless. Happily, I doubt anybody really cares. I've occasionally wondered if we have situations where the "smp_store_release()" only cares about previous *writes* being ordered (ie a "smp_wmb()+WRITE_ONCE" would be sufficient). It makes no difference on x86 (all stores are relases), power64 (wmb and store_release are both LWSYNC) or arm64 (str is documentated to be cheaper than DMB). On alpha, smp_wmb()+WRITE_ONCE() is cheaper than smp_store_release(), but nobody sane cares. But *if* we have a situation where the "smp_store_release()" might be just a "previous writes need to be visible" rather than ordering previous reads too, we could maybe introduce that kind of op. I _think_ the RCU writes tend to be of that kind? Linus