Re: On inlining more helpers in the JITs or the verifier

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 2, 2024 at 10:37 AM Puranjay Mohan <puranjay@xxxxxxxxxx> wrote:
>
>
> Hi Everyone,
>
> While working on inlining bpf_get_smp_processor_id() in the ARM64 and
> RISCV JITs, I realized that these archs allow such optimizations because
> they keep some information like the per-cpu offset or the pointer to the
> task_struct in special system registers.
>
> So, I went through the list of all BPF helpers and made a list of
> helpers that we can inline in these JITs to make their usage much more
> optimized:
>
> I. ARM64 and RISC-V specific optimzations if inlined:
>
>     A) Because pointer to tast_struct is available in a register:
>         1. bpf_get_current_pid_tgid()
>         2. bpf_get_current_task()

These two are used really frequently, so it might make sense to
optimize them (and also bpf_get_current_task_btf(), of course), if
others agree with me.

>         3. bpf_set_retval()
>         4. bpf_get_retval()
>         5. bpf_task_pt_regs()

I'm leaning towards saying that probably not, unless we have a really
good reason to. Inlining is not free in terms of code maintenance and
complexity, so I wouldn't go and inline everything possible. But maybe
others have another opinion.


>         6. bpf_get_attach_cookie()

definitely no, there are multiple implementations depending on
specific program type

>
>     B) Because per_cpu offset is available in a register:
>         1. bpf_this_cpu_ptr()

maybe, but I don't think we inline at BPF instruction level, so
inlining in BPF JIT seems premature


>         2. bpf_get_numa_node_id()

I'm not sure how actively this is used, so I'd say no to this one as well.

>
>         These can be inlined in the verifier too using the newly
>         introduced per-cpu instruction.

yep, I'd start with doing BPF assembly inlining for
bpf_this_cpu_ptr/bpf_per_cpu_ptr, tbh

>
> II. These are very basic writes, can be inlined in the verifier or the JIT:
>     1. bpf_msg_apply_bytes()
>     2. bpf_msg_cork_bytes()
>     3. bpf_set_hash_invalid()

I'd say this is also going overboard with inlining.

>
> I will first try to inline all these in the ARM64 JIT and see the
> performance improvement. I am not sure what would be the best way to
> benchmark all of this inlining.
>
> Andrii, can you suggest something for the benchmarking?
>
> Looking forward to your thoughts on this.
>
> Thanks,
> Puranjay





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux