On 3/14/22 9:48 AM, Xu Kuohai wrote:
The current BPF store/load instruction is translated by the JIT into two instructions. The first instruction moves the immediate offset into a temporary register. The second instruction uses this temporary register to do the real store/load. In fact, arm64 supports addressing with immediate offsets. So This patch introduces optimization that uses arm64 str/ldr instruction with immediate offset when the offset fits. Example of generated instuction for r2 = *(u64 *)(r1 + 0): without optimization: mov x10, 0 ldr x1, [x0, x10] with optimization: ldr x1, [x0, 0] If the offset is negative, or is not aligned correctly, or exceeds max value, rollback to the use of temporary register. Result for test_bpf: # dmesg -D # insmod test_bpf.ko # dmesg | grep Summary test_bpf: Summary: 1009 PASSED, 0 FAILED, [997/997 JIT'ed] test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED Signed-off-by: Xu Kuohai <xukuohai@xxxxxxxxxx>
[...] Thanks for working on this and also including the result for test_bpf! Does it also contain corner cases where the rollback to the temporary register is triggered? (If not, lets add more test cases to it.) Could you split this into two patches, one that touches arch/arm64/lib/insn.c and arch/arm64/include/asm/insn.h for the instruction encoder, and then the other part for the JIT-only bits? Will, would you be okay if we route this via bpf-next with your Ack, or do we need to pull feature branch again? Thanks, Daniel