On Fri, May 31, 2024 at 04:56:28AM +0100, Maciej W. Rozycki wrote: > On Wed, 29 May 2024, Paul E. McKenney wrote: > > > > Mind that the read-modify-write sequence that software does for sub-word > > > write accesses with original Alpha hardware is precisely what hardware > > > would have to do anyway and support for that was deliberately omitted by > > > the architecture designers from the ISA to give it performance advantages > > > quoted in the architecture manual. The only difference here is that with > > > hardware read-modify-write operations atomicity for sub-word accesses is > > > guaranteed by the ISA, however for software read-modify-write it has to be > > > explictly coded using the usual load-locked/store-conditional sequence in > > > a loop. I don't think it's a big deal really, it should be trivial to do > > > in the relevant accessors, along with the memory barriers that are needed > > > anyway for EV56+ and possibly other ports such as the MIPS one. > > > > There shouldn't be any memory barriers required, and don't EV56+ have > > single-byte loads and stores? > > I should have commented on this in my original reply. > > You're the RCU expert so you know the answer. I don't. If it's OK for > successive writes to get reordered, or readers to see a stale value, then > you don't need memory barriers. Otherwise you do. Whether byte accesses > are available or not does not matter, the CPU *will* do reordering if it's > allowed to (or more specifically, it won't do anything to prevent it from > happening, especially in SMP configurations; I can't remember offhand if > there are cases with UP). Also adjacent byte writes may be merged, but I > suppose it does not matter, or does it? RCU uses whichever wrapper is required. For example, if ordering is required, it might use smp_store_release() and smp_load_acquire(). If ordering does not matter, it might use WRITE_ONCE() and READ_ONCE(). If tearing/fusing/merging does not matter, as in there are not concurrent accesses, it uses plain C-language loads and stores. > NB MIPS has similar architectural arrangements (and a bunch of barriers > defined in the ISA), it's just most implementations are actually strongly > ordered, so most people can't see the effects of this. With MIPS I know > for sure there are cases of UP reordering, but they only really matter for > MMIO use cases and not regular memory. Any given architecture is required to provide architecture-specific implementations of the various functions that meet the requirements of Linux-kernel memory model. See tools/memory-model for more information. Thanx, Paul