Re: [PATCH bpf-next v3 0/3] bpf: inline bpf_kptr_xchg()

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Sat, 6 Jan 2024 10:34:37 +0800

On 1/6/2024 6:53 AM, Song Liu wrote:
> On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>
>> Hi,
>>
>> The motivation of inlining bpf_kptr_xchg() comes from the performance
>> profiling of bpf memory allocator benchmark [1]. The benchmark uses
>> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
>> objects for free. After inling bpf_kptr_xchg(), the performance for
>> object free on 8-CPUs VM increases about 2%~10%. However the performance
>> gain comes with costs: both the kasan and kcsan checks on the pointer
>> will be unavailable. Initially the inline is implemented in do_jit() for
>> x86-64 directly, but I think it will more portable to implement the
>> inline in verifier.
> How much work would it take to enable this on other major architectures?
> AFAICT, most jit compilers already handle BPF_XCHG, so it should be
> relatively simple?

Yes. I think enabling this inline will be relatively simple. As said in
patch #1, the inline depends on two conditions:
1) atomic_xchg() support on pointer-sized word.
2)  the implementation of xchg is the same as atomic_xchg() on
pointer-sized words.
For condition 1), I think most major architecture JIT backends have
support it. So the following work is to check the implementation of xchg
and atomic_xchg(), to enable the inline and to do more test.

I will try to enable the inline on arm64 first. And will x86-64 + arm64
be enough for the definition of "major architectures" ? Or Should it
include riscv, s380, powerpc as well ?

> Other than this, for the set
>
> Acked-by: Song Liu <song@xxxxxxxxxx>

Thanks for the ack.
>
> Thanks,
> Song
> .