Re: [PATCH bpf-next] bpf: Optimize emit_mov_imm64().

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Tue, 2 Apr 2024 17:48:34 +0200

On 4/2/24 1:38 AM, Alexei Starovoitov wrote:
From: Alexei Starovoitov <ast@xxxxxxxxxx>

Turned out that bpf prog callback addresses, bpf prog addresses
used in bpf_trampoline, and in other cases the 64-bit address
can be represented as sign extended 32-bit value.
According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339
"Skylake has 0.64c throughput for mov r64, imm64, vs. 0.25 for mov r32, imm32."
So use shorter encoding and faster instruction when possible.

Special care is needed in jit_subprogs(), since bpf_pseudo_func()
instruction cannot change its size during the last step of JIT.

Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
---
  arch/x86/net/bpf_jit_comp.c |  5 ++++-
  kernel/bpf/verifier.c       | 13 ++++++++++---
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 3b639d6f2f54..47abddac6dc3 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -816,9 +816,10 @@ static void emit_mov_imm32(u8 **pprog, bool sign_propagate,
  static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
  			   const u32 imm32_hi, const u32 imm32_lo)
  {
+	u64 imm64 = ((u64)imm32_hi << 32) | (u32)imm32_lo;
  	u8 *prog = *pprog;
  
-	if (is_uimm32(((u64)imm32_hi << 32) | (u32)imm32_lo)) {
+	if (is_uimm32(imm64)) {
  		/*
  		 * For emitting plain u32, where sign bit must not be
  		 * propagated LLVM tends to load imm64 over mov32
@@ -826,6 +827,8 @@ static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
  		 * 'mov %eax, imm32' instead.
  		 */
  		emit_mov_imm32(&prog, false, dst_reg, imm32_lo);
+	} else if (is_simm32(imm64)) {
+		emit_mov_imm32(&prog, true, dst_reg, imm32_lo);
  	} else {
  		/* movabsq rax, imm64 */
  		EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index edb650667f44..d4a338e7b5e7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19145,12 +19145,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
  		env->insn_aux_data[i].call_imm = insn->imm;
  		/* point imm to __bpf_call_base+1 from JITs point of view */
  		insn->imm = 1;
-		if (bpf_pseudo_func(insn))
+		if (bpf_pseudo_func(insn)) {
+#if defined(MODULES_VADDR)
+			u64 addr = MODULES_VADDR;
+#else
+			u64 addr = VMALLOC_START;
+#endif

Is this beneficial for all archs? It seems this patch is mainly targetting x86.
Why not having a weak function like u64 bpf_jit_alloc_exec_start() which returns
the MODULES_VADDR for x86, but leaves the rest as-is?

For example, arm64 has MODULES_VADDR defined, but the allocator uses vmalloc
range instead, see bpf_jit_alloc_exec() there, so this is a different pool and
it's also not clear if this is better or worse wrt its imm encoding.

  			/* jit (e.g. x86_64) may emit fewer instructions
  			 * if it learns a u32 imm is the same as a u64 imm.
-			 * Force a non zero here.
+			 * Set close enough to possible prog address.
  			 */
-			insn[1].imm = 1;
+			insn[0].imm = (u32)addr;
+			insn[1].imm = addr >> 32;
+		}
  	}
  
  	err = bpf_prog_alloc_jited_linfo(prog);