On Thu, May 2, 2024 at 8:19 AM Puranjay Mohan <puranjay@xxxxxxxxxx> wrote: > > From: Puranjay Mohan <puranjay12@xxxxxxxxx> > > Support an instruction for resolving absolute addresses of per-CPU > data from their per-CPU offsets. This instruction is internal-only and > users are not allowed to use them directly. They will only be used for > internal inlining optimizations for now between BPF verifier and BPF > JITs. > > Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu > access using tpidr_el1"), the per-cpu offset for the CPU is stored in > the tpidr_el1/2 register of that CPU. > > To support this BPF instruction in the ARM64 JIT, the following ARM64 > instructions are emitted: > > mov dst, src // Move src to dst, if src != dst > mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp. > add dst, dst, tmp // Add the per cpu offset to the dst. > > To measure the performance improvement provided by this change, the > benchmark in [1] was used: > > Before: > glob-arr-inc : 23.597 ± 0.012M/s > arr-inc : 23.173 ± 0.019M/s > hash-inc : 12.186 ± 0.028M/s > > After: > glob-arr-inc : 23.819 ± 0.034M/s > arr-inc : 23.285 ± 0.017M/s > hash-inc : 12.419 ± 0.011M/s > > [1] https://github.com/anakryiko/linux/commit/8dec900975ef > > Signed-off-by: Puranjay Mohan <puranjay12@xxxxxxxxx> > Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > --- > arch/arm64/include/asm/insn.h | 7 +++++++ > arch/arm64/lib/insn.c | 11 +++++++++++ > arch/arm64/net/bpf_jit.h | 6 ++++++ > arch/arm64/net/bpf_jit_comp.c | 14 ++++++++++++++ > 4 files changed, 38 insertions(+) Catalin, Will, Zi, Any objections to landing these patches into the bpf-next tree? Can we get some acks from ARM64 folks? Thanks! > > diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h > index db1aeacd4cd9..8de0e39b29f3 100644 > --- a/arch/arm64/include/asm/insn.h > +++ b/arch/arm64/include/asm/insn.h > @@ -135,6 +135,11 @@ enum aarch64_insn_special_register { > AARCH64_INSN_SPCLREG_SP_EL2 = 0xF210 > }; > [...]