Re: [PATCH bpf-next v2] bpf, arm64: Optimize BPF store/load using str/ldr with immediate offset

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Mon, 14 Mar 2022 23:11:26 +0100

On 3/14/22 9:48 AM, Xu Kuohai wrote:
The current BPF store/load instruction is translated by the JIT into two
instructions. The first instruction moves the immediate offset into a
temporary register. The second instruction uses this temporary register
to do the real store/load.

In fact, arm64 supports addressing with immediate offsets. So This patch
introduces optimization that uses arm64 str/ldr instruction with immediate
offset when the offset fits.

Example of generated instuction for r2 = *(u64 *)(r1 + 0):

without optimization:
mov x10, 0
ldr x1, [x0, x10]

with optimization:
ldr x1, [x0, 0]

If the offset is negative, or is not aligned correctly, or exceeds max
value, rollback to the use of temporary register.

Result for test_bpf:
  # dmesg -D
  # insmod test_bpf.ko
  # dmesg | grep Summary
  test_bpf: Summary: 1009 PASSED, 0 FAILED, [997/997 JIT'ed]
  test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
  test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED

Signed-off-by: Xu Kuohai <xukuohai@xxxxxxxxxx>
[...]

Thanks for working on this and also including the result for test_bpf! Does it
also contain corner cases where the rollback to the temporary register is
triggered? (If not, lets add more test cases to it.)

Could you split this into two patches, one that touches arch/arm64/lib/insn.c
and arch/arm64/include/asm/insn.h for the instruction encoder, and then the
other part for the JIT-only bits?

Will, would you be okay if we route this via bpf-next with your Ack, or do we
need to pull feature branch again?

Thanks,
Daniel