Add generic and target specific support for local{,64}_try_cmpxchg and wire up support for all targets that use local_t infrastructure. The patch enables x86 targets to emit special instruction for local_try_cmpxchg and also local64_try_cmpxchg for x86_64. The last patch changes __perf_output_begin in events/ring_buffer to use new locking primitive and improves code from 4b3: 48 8b 82 e8 00 00 00 mov 0xe8(%rdx),%rax 4ba: 48 8b b8 08 04 00 00 mov 0x408(%rax),%rdi 4c1: 8b 42 1c mov 0x1c(%rdx),%eax 4c4: 48 8b 4a 28 mov 0x28(%rdx),%rcx 4c8: 85 c0 test %eax,%eax ... 4ef: 48 89 c8 mov %rcx,%rax 4f2: 48 0f b1 7a 28 cmpxchg %rdi,0x28(%rdx) 4f7: 48 39 c1 cmp %rax,%rcx 4fa: 75 b7 jne 4b3 <...> to 4b2: 48 8b 4a 28 mov 0x28(%rdx),%rcx 4b6: 48 8b 82 e8 00 00 00 mov 0xe8(%rdx),%rax 4bd: 48 8b b0 08 04 00 00 mov 0x408(%rax),%rsi 4c4: 8b 42 1c mov 0x1c(%rdx),%eax 4c7: 85 c0 test %eax,%eax ... 4d4: 48 89 c8 mov %rcx,%rax 4d7: 48 0f b1 72 28 cmpxchg %rsi,0x28(%rdx) 4dc: 0f 85 d0 00 00 00 jne 5b2 <...> ... 5b2: 48 89 c1 mov %rax,%rcx 5b5: e9 fc fe ff ff jmp 4b6 <...> Please note that in addition to removed compare, the load from 0x28(%rdx) gets moved out of the loop and the code is rearranged according to likely/unlikely tags in the source. Cc: Richard Henderson <richard.henderson@xxxxxxxxxx> Cc: Ivan Kokshaysky <ink@xxxxxxxxxxxxxxxxxxxx> Cc: Matt Turner <mattst88@xxxxxxxxx> Cc: Huacai Chen <chenhuacai@xxxxxxxxxx> Cc: WANG Xuerui <kernel@xxxxxxxxxx> Cc: Thomas Bogendoerfer <tsbogend@xxxxxxxxxxxxxxxx> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> Cc: Nicholas Piggin <npiggin@xxxxxxxxx> Cc: Christophe Leroy <christophe.leroy@xxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Borislav Petkov <bp@xxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: x86@xxxxxxxxxx Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Cc: Arnd Bergmann <arnd@xxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> Cc: Mark Rutland <mark.rutland@xxxxxxx> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx> Cc: Jiri Olsa <jolsa@xxxxxxxxxx> Cc: Namhyung Kim <namhyung@xxxxxxxxxx> Cc: Ian Rogers <irogers@xxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Boqun Feng <boqun.feng@xxxxxxxxx> Cc: Jiaxun Yang <jiaxun.yang@xxxxxxxxxxx> Cc: Jun Yi <yijun@xxxxxxxxxxx> Uros Bizjak (10): locking/atomic: Add missing cast to try_cmpxchg() fallbacks locking/atomic: Add generic try_cmpxchg{,64}_local support locking/alpha: Wire up local_try_cmpxchg locking/loongarch: Wire up local_try_cmpxchg locking/mips: Wire up local_try_cmpxchg locking/powerpc: Wire up local_try_cmpxchg locking/x86: Wire up local_try_cmpxchg locking/generic: Wire up local{,64}_try_cmpxchg locking/x86: Enable local{,64}_try_cmpxchg perf/ring_buffer: use local_try_cmpxchg in __perf_output_begin arch/alpha/include/asm/local.h | 2 ++ arch/loongarch/include/asm/local.h | 2 ++ arch/mips/include/asm/local.h | 2 ++ arch/powerpc/include/asm/local.h | 11 ++++++ arch/x86/include/asm/cmpxchg.h | 6 ++++ arch/x86/include/asm/local.h | 2 ++ include/asm-generic/local.h | 1 + include/asm-generic/local64.h | 2 ++ include/linux/atomic/atomic-arch-fallback.h | 40 ++++++++++++++++----- include/linux/atomic/atomic-instrumented.h | 20 ++++++++++- kernel/events/ring_buffer.c | 5 +-- scripts/atomic/gen-atomic-fallback.sh | 6 +++- scripts/atomic/gen-atomic-instrumented.sh | 2 +- 13 files changed, 87 insertions(+), 14 deletions(-) -- 2.39.2