On Fri, Mar 29, 2024 at 11:47 AM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > > If BPF JIT supports per-CPU LDX instructions, inline > bpf_get_smp_processor_id() to eliminate unnecessary function calls. > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > --- > kernel/bpf/verifier.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index edb650667f44..24caec8b200d 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -20072,6 +20072,23 @@ static int do_misc_fixups(struct bpf_verifier_env *env) > goto next_insn; > } > > + /* Implement bpf_get_smp_processor_id() inline. */ > + if (insn->imm == BPF_FUNC_get_smp_processor_id && > + prog->jit_requested && bpf_jit_supports_percpu_insns()) { > + insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(long)&pcpu_hot.cpu_number); so CI reminds me that this part will have to be architecture-specific. We can keep BPF_FUNC_get_smp_processor_id inlining here in kernel/bpf/verifier.c, but have arch-specific #ifdef/#elif/#endif logic? Or we can have an arch_bpf_inline_helper() call or something, where different architectures can more cleanly implement arch-specific inlining logic? What would be the preferred way? For arm64, it seems we need to just do &cpu_number instead of &pcpu_hot.cpu_number. For s390x there is some S390_lowcore thing involved, which I have no idea about, so I'll be asking for someone's help there. > + insn_buf[1] = BPF_LDX_MEM_PERCPU(BPF_W, BPF_REG_0, BPF_REG_0, 0); > + cnt = 2; > + > + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); > + if (!new_prog) > + return -ENOMEM; > + > + delta += cnt - 1; > + env->prog = prog = new_prog; > + insn = new_prog->insnsi + i + delta; > + goto next_insn; > + } > + > /* Implement bpf_get_func_arg inline. */ > if (prog_type == BPF_PROG_TYPE_TRACING && > insn->imm == BPF_FUNC_get_func_arg) { > -- > 2.43.0 >