On Mon, Jul 01, 2024 at 09:03:52PM GMT, Nicholas Piggin wrote: > On Fri Jun 21, 2024 at 5:09 AM AEST, Naveen N Rao wrote: > > Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline() > > for 64-bit powerpc. > > What do BPF trampolines give you? At a very basic level, they provide a way to attach bpf programs at function entry/exit - as an alternative to ftrace/kprobe - in a more optimal manner. Commit fec56f5890d9 ("bpf: Introduce BPF trampoline") has more details. > > > BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace > > having a single nop at function entry, followed by the function > > profiling sequence out-of-line and a separate long branch stub for calls > > to trampolines that are out of range. A dummy_tramp is provided to > > simplify synchronization similar to arm64. > > Synrhonization - between BPF and ftrace interfaces? > > > BPF Trampolines adhere to the existing ftrace ABI utilizing a > > two-instruction profiling sequence, as well as the newer ABI utilizing a > > three-instruction profiling sequence enabling return with a 'blr'. The > > trampoline code itself closely follows x86 implementation. > > > > While the code is generic, BPF trampolines are only enabled on 64-bit > > powerpc. 32-bit powerpc will need testing and some updates. > > > > Signed-off-by: Naveen N Rao <naveen@xxxxxxxxxx> > > Just a quick glance for now, and I don't know BPF code much. > > > --- > > arch/powerpc/include/asm/ppc-opcode.h | 14 + > > arch/powerpc/net/bpf_jit.h | 11 + > > arch/powerpc/net/bpf_jit_comp.c | 702 +++++++++++++++++++++++++- > > arch/powerpc/net/bpf_jit_comp32.c | 7 +- > > arch/powerpc/net/bpf_jit_comp64.c | 7 +- > > 5 files changed, 738 insertions(+), 3 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h > > index 076ae60b4a55..9eaa2c5d9b73 100644 > > --- a/arch/powerpc/include/asm/ppc-opcode.h > > +++ b/arch/powerpc/include/asm/ppc-opcode.h > > @@ -585,12 +585,26 @@ > > #define PPC_RAW_MTSPR(spr, d) (0x7c0003a6 | ___PPC_RS(d) | __PPC_SPR(spr)) > > #define PPC_RAW_EIEIO() (0x7c0006ac) > > > > +/* bcl 20,31,$+4 */ > > +#define PPC_RAW_BCL() (0x429f0005) > > This is the special bcl form that gives the current address. > Maybe call it PPC_RAW_BCL4() Sure. > > > > > +void dummy_tramp(void); > > + > > +asm ( > > +" .pushsection .text, \"ax\", @progbits ;" > > +" .global dummy_tramp ;" > > +" .type dummy_tramp, @function ;" > > +"dummy_tramp: ;" > > +#ifdef CONFIG_FTRACE_PFE_OUT_OF_LINE > > +" blr ;" > > +#else > > +" mflr 11 ;" > > Can you just drop this instruction? The caller will always > have it in r11? Indeed. Will add a comment and remove the instruction. > > > +" mtctr 11 ;" > > +" mtlr 0 ;" > > +" bctr ;" > > +#endif > > +" .size dummy_tramp, .-dummy_tramp ;" > > +" .popsection ;" > > +); > > + > > +void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx) > > +{ > > + int ool_stub_idx, long_branch_stub_idx; > > + > > + /* > > + * Out-of-line stub: > > + * mflr r0 > > + * [b|bl] tramp > > + * mtlr r0 // only with CONFIG_FTRACE_PFE_OUT_OF_LINE > > + * b bpf_func + 4 > > + */ > > + ool_stub_idx = ctx->idx; > > + EMIT(PPC_RAW_MFLR(_R0)); > > + EMIT(PPC_RAW_NOP()); > > + if (IS_ENABLED(CONFIG_FTRACE_PFE_OUT_OF_LINE)) > > + EMIT(PPC_RAW_MTLR(_R0)); > > + WARN_ON_ONCE(!is_offset_in_branch_range(4 - (long)ctx->idx * 4)); /* TODO */ > > + EMIT(PPC_RAW_BRANCH(4 - (long)ctx->idx * 4)); > > + > > + /* > > + * Long branch stub: > > + * .long <dummy_tramp_addr> > > + * mflr r11 > > + * bcl 20,31,$+4 > > + * mflr r12 > > + * ld r12, -8-SZL(r12) > > + * mtctr r12 > > + * mtlr r11 // needed to retain ftrace ABI > > + * bctr > > + */ > > You could avoid clobbering LR on >= POWER9 with addpcis instruction. Or > use a pcrel load with pcrel even. I guess that's something to do later. Yes, much of BPF JIT could use a re-look to consider opportunities to emit prefix instructions. Thanks, Naveen