Puranjay Mohan <puranjay@xxxxxxxxxx> writes: > Hi Everyone, > > While working on inlining bpf_get_smp_processor_id() in the ARM64 and > RISCV JITs, I realized that these archs allow such optimizations because > they keep some information like the per-cpu offset or the pointer to the > task_struct in special system registers. > > So, I went through the list of all BPF helpers and made a list of > helpers that we can inline in these JITs to make their usage much more > optimized: > > I. ARM64 and RISC-V specific optimzations if inlined: > > A) Because pointer to tast_struct is available in a register: > 1. bpf_get_current_pid_tgid() > 2. bpf_get_current_task() Tried inlining bpf_get_current_task() on ARM64: Before After -------- -------- bpf_prog_6e2672bcc4451a42_trigger_get_current_task: bpf_prog_6e2672bcc4451a42_trigger_get_current_task: ; task = (struct task_struct *)bpf_get_current_task(); ; task = (struct task_struct *)bpf_get_current_task(); 34: mov x10, #0xffffffffffff9838 34: mrs x7, sp_el0 38: movk x10, #0x8027, lsl #16 3c: movk x10, #0x8000, lsl #32 40: blr x10 44: add x7, x0, #0x0 In the non-inlined version there is a branch [blr x10] to: 0xffff800080279838 bpf_get_current_task: <+0>: mrs x0, sp_el0 <+4>: ret So, we only need a single instruction after inlining!! I just don't know the best way to benchmark this. In theory it looks highly optimized. Thanks, Puranjay