Pu Lehui <pulehui@xxxxxxxxxx> writes: > On 2023/7/19 4:06, Björn Töpel wrote: >> Pu Lehui <pulehui@xxxxxxxxxxxxxxx> writes: >> >>> From: Pu Lehui <pulehui@xxxxxxxxxx> >>> >>> Commit 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to >>> half") optimizes the detour code size of kernel functions to half with >>> T0 register and the upcoming DYNAMIC_FTRACE_WITH_DIRECT_CALLS of riscv >>> is based on this optimization, we need to adapt riscv bpf trampoline >>> based on this. One thing to do is to reduce detour code size of bpf >>> programs, and the second is to deal with the return address after the >>> execution of bpf trampoline. Meanwhile, add more comments and rename >>> some variables to make more sense. The related tests have passed. >>> >>> This adaptation needs to be merged before the upcoming >>> DYNAMIC_FTRACE_WITH_DIRECT_CALLS of riscv, otherwise it will crash due >>> to a mismatch in the return address. So we target this modification to >>> bpf tree and add fixes tag for locating. >> >> Thank you for working on this! >> >>> Fixes: 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to half") >> >> This is not a fix. Nothing is broken. Only that this patch much come >> before or as part of the ftrace series. > > Yep, it's really not a fix. I have no idea whether this patch target to > bpf-next tree can be ahead of the ftrace series of riscv tree? For this patch, I'd say it's easier to take it via the RISC-V tree, IFF the ftrace series is in for-next. [...] >>> +#define DETOUR_NINSNS 2 >> >> Better name? Maybe call this patchable function entry something? Also, > > How about RV_FENTRY_NINSNS? Sure. And more importantly that it's actually used in the places where nops/skips are done. >> to catch future breaks like this -- would it make sense to have a >> static_assert() combined with something tied to >> -fpatchable-function-entry= from arch/riscv/Makefile? > > It is very necessary, but it doesn't seem to be easy. I try to find GCC > related functions, something like __builtin_xxx, but I can't find it so > far. Also try to make it as a CONFIG_PATCHABLE_FUNCTION_ENTRY=4 in > Makefile and then static_assert, but obviously it shouldn't be done. > Maybe we can deal with this later when we have a solution? Ok! [...] >>> @@ -787,20 +762,19 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, >>> int i, ret, offset; >>> int *branches_off = NULL; >>> int stack_size = 0, nregs = m->nr_args; >>> - int retaddr_off, fp_off, retval_off, args_off; >>> - int nregs_off, ip_off, run_ctx_off, sreg_off; >>> + int fp_off, retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off; >>> struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; >>> struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; >>> struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; >>> void *orig_call = func_addr; >>> - bool save_ret; >>> + bool save_retval, traced_ret; >>> u32 insn; >>> >>> /* Generated trampoline stack layout: >>> * >>> * FP - 8 [ RA of parent func ] return address of parent >>> * function >>> - * FP - retaddr_off [ RA of traced func ] return address of traced >>> + * FP - 16 [ RA of traced func ] return address of >>> traced >> >> BPF code uses frame pointers. Shouldn't the trampoline frame look like a >> regular frame [1], i.e. start with return address followed by previous >> frame pointer? >> > > oops, will fix it. Also we need to consider two types of trampoline > stack layout, that is: > > * 1. trampoline called from function entry > * -------------------------------------- > * FP + 8 [ RA of parent func ] return address of parent > * function > * FP + 0 [ FP ] > * > * FP - 8 [ RA of traced func ] return address of traced > * function > * FP - 16 [ FP ] > * -------------------------------------- > * > * 2. trampoline called directly > * -------------------------------------- > * FP - 8 [ RA of caller func ] return address of caller > * function > * FP - 16 [ FP ] > * -------------------------------------- Hmm, could you expand a bit on this? The stack frame top 16B (8+8) should follow what the psabi suggests, regardless of the call site? Maybe it's me that's not following -- please explain a bit more! Björn