Re: [PATCH bpf-next 2/2] bpf, arm64: Emit A64_{ADD,SUB}_I when possible in emit_{lse,ll_sc}_atomic()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/28/2024 7:36 AM, Peilin Ye wrote:
Currently in emit_{lse,ll_sc}_atomic(), if there is an offset, we add it
to the base address by emitting two instructions, for example:

   if (off) {
           emit_a64_mov_i(1, tmp, off, ctx);
           emit(A64_ADD(1, tmp, tmp, dst), ctx);
   ...

As pointed out by Xu, we can combine the above into a single A64_ADD_I
instruction if 'is_addsub_imm(off)' is true, or an A64_SUB_I, if
'is_addsub_imm(-off)' is true.

Suggested-by: Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx>
Signed-off-by: Peilin Ye <yepeilin@xxxxxxxxxx>
---
Hi all,

This was pointed out by Xu in [1] .  Tested on x86-64, using
PLATFORM=aarch64 CROSS_COMPILE=aarch64-linux-gnu- vmtest.sh:

LSE:
   * ./test_progs-cpuv4 -a atomics,arena_atomics
     2/15 PASSED, 0 SKIPPED, 0 FAILED
   * ./test_verifier
     790 PASSED, 0 SKIPPED, 0 FAILED

LL/SC:
(In vmtest.sh, changed '-cpu' QEMU option from 'cortex-a76' to
  'cortex-a57', to make LSE atomics unavailable.)
   * ./test_progs-cpuv4 -a atomics
     1/7 PASSED, 0 SKIPPED, 0 FAILED
   * ./test_verifier
     790 PASSED, 0 SKIPPED, 0 FAILED

Thanks,
Peilin Ye

[1] https://lore.kernel.org/bpf/f704019d-a8fa-4cf5-a606-9d8328360a3e@xxxxxxxxxxxxxxx/

  arch/arm64/net/bpf_jit_comp.c | 26 ++++++++++++++++++--------
  1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 9040033eb1ea..f15bbe92fed9 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -649,8 +649,14 @@ static int emit_lse_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
  	u8 reg = dst;
if (off) {
-		emit_a64_mov_i(1, tmp, off, ctx);
-		emit(A64_ADD(1, tmp, tmp, dst), ctx);
+		if (is_addsub_imm(off)) {
+			emit(A64_ADD_I(1, tmp, reg, off), ctx);
+		} else if (is_addsub_imm(-off)) {
+			emit(A64_SUB_I(1, tmp, reg, -off), ctx);
+		} else {
+			emit_a64_mov_i(1, tmp, off, ctx);
+			emit(A64_ADD(1, tmp, tmp, reg), ctx);
+		}
  		reg = tmp;
  	}
  	if (arena) {
@@ -721,7 +727,7 @@ static int emit_ll_sc_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
  	const s32 imm = insn->imm;
  	const s16 off = insn->off;
  	const bool isdw = BPF_SIZE(code) == BPF_DW;
-	u8 reg;
+	u8 reg = dst;
  	s32 jmp_offset;
if (BPF_MODE(code) == BPF_PROBE_ATOMIC) {
@@ -730,11 +736,15 @@ static int emit_ll_sc_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
  		return -EINVAL;
  	}
- if (!off) {
-		reg = dst;
-	} else {
-		emit_a64_mov_i(1, tmp, off, ctx);
-		emit(A64_ADD(1, tmp, tmp, dst), ctx);
+	if (off) {
+		if (is_addsub_imm(off)) {
+			emit(A64_ADD_I(1, tmp, reg, off), ctx);
+		} else if (is_addsub_imm(-off)) {
+			emit(A64_SUB_I(1, tmp, reg, -off), ctx);
+		} else {
+			emit_a64_mov_i(1, tmp, off, ctx);
+			emit(A64_ADD(1, tmp, tmp, reg), ctx);
+		}
  		reg = tmp;
  	}

Thanks, this looks good to me, but we now have serveral repetitive code
snippets like this. It would be better to refactor them into a common
function.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux