On Sat, 2023-12-23 at 18:40 +0800, Hou Tao wrote: > From: Hou Tao <houtao1@xxxxxxxxxx> > > The motivation of inlining bpf_kptr_xchg() comes from the performance > profiling of bpf memory allocator benchmark. The benchmark uses > bpf_kptr_xchg() to stash the allocated objects and to pop the stashed > objects for free. After inling bpf_kptr_xchg(), the performance for > object free on 8-CPUs VM increases about 2%~10%. The inline also has > downside: both the kasan and kcsan checks on the pointer will be > unavailable. > > bpf_kptr_xchg() can be inlined by converting the calling of > bpf_kptr_xchg() into an atomic_xchg() instruction. But the conversion > depends on two conditions: > 1) JIT backend supports atomic_xchg() on pointer-sized word > 2) For the specific arch, the implementation of xchg is the same as > atomic_xchg() on pointer-sized words. > > It seems most 64-bit JIT backends satisfies these two conditions. But > as a precaution, defining a weak function bpf_jit_supports_ptr_xchg() > to state whether such conversion is safe and only supporting inline for > 64-bit host. > > For x86-64, it supports BPF_XCHG atomic operation and both xchg() and > atomic_xchg() use arch_xchg() to implement the exchange, so enabling the > inline of bpf_kptr_xchg() on x86-64 first. > > Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> Reviewed-by: Eduard Zingerman <eddyz87@xxxxxxxxx>