On Thu, May 02, 2024 at 04:32:35PM -0700, Linus Torvalds wrote: > On Thu, 2 May 2024 at 16:12, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > One of RCU's state machines uses smp_store_release() to start the > > state machine (only one task gets to do this) and cmpxchg() to update > > state beyond that point. And the state is 8 bits so that it and other > > state fits into 32 bits to allow a single check for multiple conditions > > elsewhere. > > Note that since alpha lacks the release-acquire model, it's always > going to be a full memory barrier before the store. > > And then the store turns into a load-mask-store for older alphas. > > So it's going to be a complete mess from a performance standpoint regardless. And on those older machines, a mess functionally because the other three bytes in that same 32-bit word can be concurrently updated. Hence Arnd's patch being necessary here. EV56 and later all have single-byte stores, so they are OK. They were introduced in the mid-1990s, so even they are antiques. ;-) > Happily, I doubt anybody really cares. Here is hoping! > I've occasionally wondered if we have situations where the > "smp_store_release()" only cares about previous *writes* being ordered > (ie a "smp_wmb()+WRITE_ONCE" would be sufficient). Back in the day, rcu_assign_pointer() worked this way. But later there were a few use cases where ordering prior reads was needed. And in this case, we just barely need that full store-release functionality. There is a preceding mutex lock-unlock pair that provides a full barrier post-boot on almost all systems. > It makes no difference on x86 (all stores are relases), power64 (wmb > and store_release are both LWSYNC) or arm64 (str is documentated to be > cheaper than DMB). > > On alpha, smp_wmb()+WRITE_ONCE() is cheaper than smp_store_release(), > but nobody sane cares. > > But *if* we have a situation where the "smp_store_release()" might be > just a "previous writes need to be visible" rather than ordering > previous reads too, we could maybe introduce that kind of op. I > _think_ the RCU writes tend to be of that kind? Most of the time, rcu_assign_pointer() only needs to order prior writes, not both reads and writes. In theory, we could make an something like an rcu_assign_pointer_reads_too(), though hopefully with a shorter name, and go back to smp_wmb() for rcu_assign_pointer(). But in practice, I am having a really hard time convincing myself that it would be worth it. Thanx, Paul