On 8/13/24 14:26, Christoph Lameter via B4 Relay wrote:
From: "Christoph Lameter (Ampere)" <cl@xxxxxxxxxx>
Some architectures support load acquire which can save us a memory
barrier and save some cycles.
A typical sequence
do {
seq = read_seqcount_begin(&s);
<something>
} while (read_seqcount_retry(&s, seq);
requires 13 cycles on ARM64 for an empty loop. Two read memory barriers are
needed. One for each of the seqcount_* functions.
We can replace the first read barrier with a load acquire of
the seqcount which saves us one barrier.
On ARM64 doing so reduces the cycle count from 13 to 8.
Signed-off-by: Christoph Lameter (Ampere) <cl@xxxxxxxxxx>
---
arch/Kconfig | 5 +++++
arch/arm64/Kconfig | 1 +
include/linux/seqlock.h | 41 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 47 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 975dd22a2dbd..3f8867110a57 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1600,6 +1600,11 @@ config ARCH_HAS_KERNEL_FPU_SUPPORT
Architectures that select this option can run floating-point code in
the kernel, as described in Documentation/core-api/floating-point.rst.
+config ARCH_HAS_ACQUIRE_RELEASE
+ bool
+ help
+ Architectures that support acquire / release can avoid memory fences
+
source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a2f8ff354ca6..19e34fff145f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -39,6 +39,7 @@ config ARM64
select ARCH_HAS_PTE_DEVMAP
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_HW_PTE_YOUNG
+ select ARCH_HAS_ACQUIRE_RELEASE
select ARCH_HAS_SETUP_DMA_OPS
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
Do we need a new ARCH flag? I believe barrier APIs like
smp_load_acquire() will use the full barrier for those arch'es that
don't define their own smp_load_acquire().
BTW, acquire/release can be considered memory barriers too. Maybe you
are talking about preferring acquire/release barriers over read/write
barriers. Right?
Cheers,
Longman