On Thu, May 2, 2024 at 2:22 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Thu, May 2, 2024 at 10:37 AM Puranjay Mohan <puranjay@xxxxxxxxxx> wrote: > > > > > > Hi Everyone, > > > > While working on inlining bpf_get_smp_processor_id() in the ARM64 and > > RISCV JITs, I realized that these archs allow such optimizations because > > they keep some information like the per-cpu offset or the pointer to the > > task_struct in special system registers. > > > > So, I went through the list of all BPF helpers and made a list of > > helpers that we can inline in these JITs to make their usage much more > > optimized: > > > > I. ARM64 and RISC-V specific optimzations if inlined: > > > > A) Because pointer to tast_struct is available in a register: > > 1. bpf_get_current_pid_tgid() > > 2. bpf_get_current_task() > > These two are used really frequently, so it might make sense to > optimize them (and also bpf_get_current_task_btf(), of course), if > others agree with me. > > > 3. bpf_set_retval() > > 4. bpf_get_retval() > > 5. bpf_task_pt_regs() > > I'm leaning towards saying that probably not, unless we have a really > good reason to. Inlining is not free in terms of code maintenance and > complexity, so I wouldn't go and inline everything possible. But maybe > others have another opinion. > > > > 6. bpf_get_attach_cookie() > > definitely no, there are multiple implementations depending on > specific program type > > > > > B) Because per_cpu offset is available in a register: > > 1. bpf_this_cpu_ptr() > > maybe, but I don't think we inline at BPF instruction level, so > inlining in BPF JIT seems premature > > > > 2. bpf_get_numa_node_id() > > I'm not sure how actively this is used, so I'd say no to this one as well. > > > > > These can be inlined in the verifier too using the newly > > introduced per-cpu instruction. > > yep, I'd start with doing BPF assembly inlining for > bpf_this_cpu_ptr/bpf_per_cpu_ptr, tbh > > > > > II. These are very basic writes, can be inlined in the verifier or the JIT: > > 1. bpf_msg_apply_bytes() > > 2. bpf_msg_cork_bytes() > > 3. bpf_set_hash_invalid() > > I'd say this is also going overboard with inlining. +1 simplicity of logic is not a reason to inline it. I would only inline bpf_get_current_task[_btf]() and do it in the verifier. JITs should inline only if perf delta is really significant. I hope bpf_get_smp_processor_id() will be the only such example.