Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> writes: > From: Xu Kuohai <xukuohai@xxxxxxxxxx> > > Currently, BPF_CALL is always jited to indirect call. When target is > within the range of direct call, BPF_CALL can be jited to direct call. > > For example, the following BPF_CALL > > call __htab_map_lookup_elem > > is always jited to indirect call: > > mov x10, #0xffffffffffff18f4 > movk x10, #0x821, lsl #16 > movk x10, #0x8000, lsl #32 > blr x10 > > When the address of target __htab_map_lookup_elem is within the range of > direct call, the BPF_CALL can be jited to: > > bl 0xfffffffffd33bc98 > > This patch does such jit optimization by emitting arm64 direct calls for > BPF_CALL when possible, indirect calls otherwise. > > Without this patch, the jit works as follows. > > 1. First pass > A. Determine jited position and size for each bpf instruction. > B. Computed the jited image size. > > 2. Allocate jited image with size computed in step 1. > > 3. Second pass > A. Adjust jump offset for jump instructions > B. Write the final image. > > This works because, for a given bpf prog, regardless of where the jited > image is allocated, the jited result for each instruction is fixed. The > second pass differs from the first only in adjusting the jump offsets, > like changing "jmp imm1" to "jmp imm2", while the position and size of > the "jmp" instruction remain unchanged. > > Now considering whether to jit BPF_CALL to arm64 direct or indirect call > instruction. The choice depends solely on the jump offset: direct call > if the jump offset is within 128MB, indirect call otherwise. > > For a given BPF_CALL, the target address is known, so the jump offset is > decided by the jited address of the BPF_CALL instruction. In other words, > for a given bpf prog, the jited result for each BPF_CALL is determined > by its jited address. > > The jited address for a BPF_CALL is the jited image address plus the > total jited size of all preceding instructions. For a given bpf prog, > there are clearly no BPF_CALL instructions before the first BPF_CALL > instruction. Since the jited result for all other instructions other > than BPF_CALL are fixed, the total jited size preceding the first > BPF_CALL is also fixed. Therefore, once the jited image is allocated, > the jited address for the first BPF_CALL is fixed. > > Now that the jited result for the first BPF_CALL is fixed, the jited > results for all instructions preceding the second BPF_CALL are fixed. > So the jited address and result for the second BPF_CALL are also fixed. > > Similarly, we can conclude that the jited addresses and results for all > subsequent BPF_CALL instructions are fixed. > > This means that, for a given bpf prog, once the jited image is allocated, > the jited address and result for all instructions, including all BPF_CALL > instructions, are fixed. > > Based on the observation, with this patch, the jit works as follows. > > 1. First pass > Estimate the maximum jited image size. In this pass, all BPF_CALLs > are jited to arm64 indirect calls since the jump offsets are unknown > because the jited image is not allocated. > > 2. Allocate jited image with size estimated in step 1. > > 3. Second pass > A. Determine the jited result for each BPF_CALL. > B. Determine jited address and size for each bpf instruction. > > 4. Third pass > A. Adjust jump offset for jump instructions. > B. Write the final image. > > Signed-off-by: Xu Kuohai <xukuohai@xxxxxxxxxx> Thanks for working on this. I have tried to reason about all the possible edge cases that I could think of and this looks good to me: Reviewed-by: Puranjay Mohan <puranjay@xxxxxxxxxx>
Attachment:
signature.asc
Description: PGP signature