Re: [PATCH bpf-next v2] bpf, arm64: Jit BPF_CALL to direct call when possible

Puranjay Mohan <puranjay@xxxxxxxxxx> · Wed, 04 Sep 2024 12:22:35 +0000

Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> writes:

> From: Xu Kuohai <xukuohai@xxxxxxxxxx>
>
> Currently, BPF_CALL is always jited to indirect call. When target is
> within the range of direct call, BPF_CALL can be jited to direct call.
>
> For example, the following BPF_CALL
>
>     call __htab_map_lookup_elem
>
> is always jited to indirect call:
>
>     mov     x10, #0xffffffffffff18f4
>     movk    x10, #0x821, lsl #16
>     movk    x10, #0x8000, lsl #32
>     blr     x10
>
> When the address of target __htab_map_lookup_elem is within the range of
> direct call, the BPF_CALL can be jited to:
>
>     bl      0xfffffffffd33bc98
>
> This patch does such jit optimization by emitting arm64 direct calls for
> BPF_CALL when possible, indirect calls otherwise.
>
> Without this patch, the jit works as follows.
>
> 1. First pass
>    A. Determine jited position and size for each bpf instruction.
>    B. Computed the jited image size.
>
> 2. Allocate jited image with size computed in step 1.
>
> 3. Second pass
>    A. Adjust jump offset for jump instructions
>    B. Write the final image.
>
> This works because, for a given bpf prog, regardless of where the jited
> image is allocated, the jited result for each instruction is fixed. The
> second pass differs from the first only in adjusting the jump offsets,
> like changing "jmp imm1" to "jmp imm2", while the position and size of
> the "jmp" instruction remain unchanged.
>
> Now considering whether to jit BPF_CALL to arm64 direct or indirect call
> instruction. The choice depends solely on the jump offset: direct call
> if the jump offset is within 128MB, indirect call otherwise.
>
> For a given BPF_CALL, the target address is known, so the jump offset is
> decided by the jited address of the BPF_CALL instruction. In other words,
> for a given bpf prog, the jited result for each BPF_CALL is determined
> by its jited address.
>
> The jited address for a BPF_CALL is the jited image address plus the
> total jited size of all preceding instructions. For a given bpf prog,
> there are clearly no BPF_CALL instructions before the first BPF_CALL
> instruction. Since the jited result for all other instructions other
> than BPF_CALL are fixed, the total jited size preceding the first
> BPF_CALL is also fixed. Therefore, once the jited image is allocated,
> the jited address for the first BPF_CALL is fixed.
>
> Now that the jited result for the first BPF_CALL is fixed, the jited
> results for all instructions preceding the second BPF_CALL are fixed.
> So the jited address and result for the second BPF_CALL are also fixed.
>
> Similarly, we can conclude that the jited addresses and results for all
> subsequent BPF_CALL instructions are fixed.
>
> This means that, for a given bpf prog, once the jited image is allocated,
> the jited address and result for all instructions, including all BPF_CALL
> instructions, are fixed.
>
> Based on the observation, with this patch, the jit works as follows.
>
> 1. First pass
>    Estimate the maximum jited image size. In this pass, all BPF_CALLs
>    are jited to arm64 indirect calls since the jump offsets are unknown
>    because the jited image is not allocated.
>
> 2. Allocate jited image with size estimated in step 1.
>
> 3. Second pass
>    A. Determine the jited result for each BPF_CALL.
>    B. Determine jited address and size for each bpf instruction.
>
> 4. Third pass
>    A. Adjust jump offset for jump instructions.
>    B. Write the final image.
>
> Signed-off-by: Xu Kuohai <xukuohai@xxxxxxxxxx>

Thanks for working on this. I have tried to reason about all the
possible edge cases that I could think of and this looks good to me:

Reviewed-by: Puranjay Mohan <puranjay@xxxxxxxxxx>
Attachment:
signature.asc

Description: PGP signature