Hi Everyone, While working on inlining bpf_get_smp_processor_id() in the ARM64 and RISCV JITs, I realized that these archs allow such optimizations because they keep some information like the per-cpu offset or the pointer to the task_struct in special system registers. So, I went through the list of all BPF helpers and made a list of helpers that we can inline in these JITs to make their usage much more optimized: I. ARM64 and RISC-V specific optimzations if inlined: A) Because pointer to tast_struct is available in a register: 1. bpf_get_current_pid_tgid() 2. bpf_get_current_task() 3. bpf_set_retval() 4. bpf_get_retval() 5. bpf_task_pt_regs() 6. bpf_get_attach_cookie() B) Because per_cpu offset is available in a register: 1. bpf_this_cpu_ptr() 2. bpf_get_numa_node_id() These can be inlined in the verifier too using the newly introduced per-cpu instruction. II. These are very basic writes, can be inlined in the verifier or the JIT: 1. bpf_msg_apply_bytes() 2. bpf_msg_cork_bytes() 3. bpf_set_hash_invalid() I will first try to inline all these in the ARM64 JIT and see the performance improvement. I am not sure what would be the best way to benchmark all of this inlining. Andrii, can you suggest something for the benchmarking? Looking forward to your thoughts on this. Thanks, Puranjay