On Fri, 23 Aug 2024, Will Deacon wrote: > On Mon, Aug 19, 2024 at 11:30:15AM -0700, Christoph Lameter via B4 Relay wrote: > > +static __always_inline unsigned \ > > +__seqprop_##lockname##_sequence_acquire(const seqcount_##lockname##_t *s) \ > > +{ \ > > + unsigned seq = smp_load_acquire(&s->seqcount.sequence); \ > > + \ > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \ > > + return seq; \ > > + \ > > + if (preemptible && unlikely(seq & 1)) { \ > > + __SEQ_LOCK(lockbase##_lock(s->lock)); \ > > + __SEQ_LOCK(lockbase##_unlock(s->lock)); \ > > + \ > > + /* \ > > + * Re-read the sequence counter since the (possibly \ > > + * preempted) writer made progress. \ > > + */ \ > > + seq = smp_load_acquire(&s->seqcount.sequence); \ > > We could probably do even better with LDAPR here, as that should be > sufficient for this. It's a can of worms though, as it's not implemented > on all CPUs and relaxing smp_load_acquire() might introduce subtle > breakage in places where it's used to build other types of lock. Maybe > you can hack something to see if there's any performance left behind > without it? I added the following patch. Kernel booted fine. No change in the cycles of read_seq() LDAPR --------------------------- Test Single 2 CPU 4 CPU 8 CPU 16 CPU 32 CPU 64 CPU ALL write seq : 13 98 385 764 1551 3043 6259 11922 read seq : 8 8 8 8 8 8 9 10 rw seq : 8 101 247 300 467 742 1384 2101 LDA --------------------------- Test Single 2 CPU 4 CPU 8 CPU 16 CPU 32 CPU 64 CPU ALL write seq : 13 90 343 785 1533 3032 6315 11073 read seq : 8 8 8 8 8 8 9 11 rw seq : 8 79 227 313 423 755 1313 2220 Index: linux/arch/arm64/include/asm/barrier.h =================================================================== --- linux.orig/arch/arm64/include/asm/barrier.h +++ linux/arch/arm64/include/asm/barrier.h @@ -167,22 +167,22 @@ do { \ kasan_check_read(__p, sizeof(*p)); \ switch (sizeof(*p)) { \ case 1: \ - asm volatile ("ldarb %w0, %1" \ + asm volatile (".arch_extension rcpc\nldaprb %w0, %1" \ : "=r" (*(__u8 *)__u.__c) \ : "Q" (*__p) : "memory"); \ break; \ case 2: \ - asm volatile ("ldarh %w0, %1" \ + asm volatile (".arch_extension rcpc\nldaprh %w0, %1" \ : "=r" (*(__u16 *)__u.__c) \ : "Q" (*__p) : "memory"); \ break; \ case 4: \ - asm volatile ("ldar %w0, %1" \ + asm volatile (".arch_extension rcpc\nldapr %w0, %1" \ : "=r" (*(__u32 *)__u.__c) \ : "Q" (*__p) : "memory"); \ break; \ case 8: \ - asm volatile ("ldar %0, %1" \ + asm volatile (".arch_extension rcpc\nldapr %0, %1" \ : "=r" (*(__u64 *)__u.__c) \ : "Q" (*__p) : "memory"); \ break; \