Re: [PATCH bpf-next v2] bpf, arm64: Optimize BPF store/load using str/ldr with immediate offset

Xu Kuohai <xukuohai@xxxxxxxxxx> · Tue, 15 Mar 2022 09:55:03 +0800

On 2022/3/15 6:11, Daniel Borkmann wrote:
On 3/14/22 9:48 AM, Xu Kuohai wrote:
The current BPF store/load instruction is translated by the JIT into two
instructions. The first instruction moves the immediate offset into a
temporary register. The second instruction uses this temporary register
to do the real store/load.

In fact, arm64 supports addressing with immediate offsets. So This patch
introduces optimization that uses arm64 str/ldr instruction with 
immediate
offset when the offset fits.

Example of generated instuction for r2 = *(u64 *)(r1 + 0):

without optimization:
mov x10, 0
ldr x1, [x0, x10]

with optimization:
ldr x1, [x0, 0]

If the offset is negative, or is not aligned correctly, or exceeds max
value, rollback to the use of temporary register.

Result for test_bpf:
  # dmesg -D
  # insmod test_bpf.ko
  # dmesg | grep Summary
  test_bpf: Summary: 1009 PASSED, 0 FAILED, [997/997 JIT'ed]
  test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed]
  test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED

Signed-off-by: Xu Kuohai <xukuohai@xxxxxxxxxx>
[...]

Thanks for working on this and also including the result for test_bpf! 
Does it also contain corner cases where the rollback to the temporary register is
triggered? (If not, lets add more test cases to it.)

Yes, I'll check and add some corner cases.

Could you split this into two patches, one that touches 
arch/arm64/lib/insn.c
and arch/arm64/include/asm/insn.h for the instruction encoder, and then the
other part for the JIT-only bits?

OK, will split in v3.

Will, would you be okay if we route this via bpf-next with your Ack, or 
do we need to pull feature branch again?

I'm fine with bpf-next.

Thanks,
Daniel
.