Pu Lehui <pulehui@xxxxxxxxxx> writes: > On 2023/7/19 23:18, Björn Töpel wrote: >> Pu Lehui <pulehui@xxxxxxxxxx> writes: >> >>> On 2023/7/19 4:06, Björn Töpel wrote: >>>> Pu Lehui <pulehui@xxxxxxxxxxxxxxx> writes: >>>> >>>>> From: Pu Lehui <pulehui@xxxxxxxxxx> >>>>> >>>>> Commit 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to >>>>> half") optimizes the detour code size of kernel functions to half with >>>>> T0 register and the upcoming DYNAMIC_FTRACE_WITH_DIRECT_CALLS of riscv >>>>> is based on this optimization, we need to adapt riscv bpf trampoline >>>>> based on this. One thing to do is to reduce detour code size of bpf >>>>> programs, and the second is to deal with the return address after the >>>>> execution of bpf trampoline. Meanwhile, add more comments and rename >>>>> some variables to make more sense. The related tests have passed. >>>>> >>>>> This adaptation needs to be merged before the upcoming >>>>> DYNAMIC_FTRACE_WITH_DIRECT_CALLS of riscv, otherwise it will crash due >>>>> to a mismatch in the return address. So we target this modification to >>>>> bpf tree and add fixes tag for locating. >>>> >>>> Thank you for working on this! >>>> >>>>> Fixes: 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to half") >>>> >>>> This is not a fix. Nothing is broken. Only that this patch much come >>>> before or as part of the ftrace series. >>> >>> Yep, it's really not a fix. I have no idea whether this patch target to >>> bpf-next tree can be ahead of the ftrace series of riscv tree? >> >> For this patch, I'd say it's easier to take it via the RISC-V tree, IFF >> the ftrace series is in for-next. >> > > alright, so let's make it target to riscv-tree to avoid that cracsh. > >> [...] >> >>>>> +#define DETOUR_NINSNS 2 >>>> >>>> Better name? Maybe call this patchable function entry something? Also, >>> >>> How about RV_FENTRY_NINSNS? >> >> Sure. And more importantly that it's actually used in the places where >> nops/skips are done. > > the new one is suited up. > >> >>>> to catch future breaks like this -- would it make sense to have a >>>> static_assert() combined with something tied to >>>> -fpatchable-function-entry= from arch/riscv/Makefile? >>> >>> It is very necessary, but it doesn't seem to be easy. I try to find GCC >>> related functions, something like __builtin_xxx, but I can't find it so >>> far. Also try to make it as a CONFIG_PATCHABLE_FUNCTION_ENTRY=4 in >>> Makefile and then static_assert, but obviously it shouldn't be done. >>> Maybe we can deal with this later when we have a solution? >> >> Ok! >> >> [...] >> >>>>> @@ -787,20 +762,19 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, >>>>> int i, ret, offset; >>>>> int *branches_off = NULL; >>>>> int stack_size = 0, nregs = m->nr_args; >>>>> - int retaddr_off, fp_off, retval_off, args_off; >>>>> - int nregs_off, ip_off, run_ctx_off, sreg_off; >>>>> + int fp_off, retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off; >>>>> struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; >>>>> struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; >>>>> struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; >>>>> void *orig_call = func_addr; >>>>> - bool save_ret; >>>>> + bool save_retval, traced_ret; >>>>> u32 insn; >>>>> >>>>> /* Generated trampoline stack layout: >>>>> * >>>>> * FP - 8 [ RA of parent func ] return address of parent >>>>> * function >>>>> - * FP - retaddr_off [ RA of traced func ] return address of traced >>>>> + * FP - 16 [ RA of traced func ] return address of >>>>> traced >>>> >>>> BPF code uses frame pointers. Shouldn't the trampoline frame look like a >>>> regular frame [1], i.e. start with return address followed by previous >>>> frame pointer? >>>> >>> >>> oops, will fix it. Also we need to consider two types of trampoline >>> stack layout, that is: >>> >>> * 1. trampoline called from function entry >>> * -------------------------------------- >>> * FP + 8 [ RA of parent func ] return address of parent >>> * function >>> * FP + 0 [ FP ] >>> * >>> * FP - 8 [ RA of traced func ] return address of traced >>> * function >>> * FP - 16 [ FP ] >>> * -------------------------------------- >>> * >>> * 2. trampoline called directly >>> * -------------------------------------- >>> * FP - 8 [ RA of caller func ] return address of caller >>> * function >>> * FP - 16 [ FP ] >>> * -------------------------------------- >> >> Hmm, could you expand a bit on this? The stack frame top 16B (8+8) >> should follow what the psabi suggests, regardless of the call site? >> > > Maybe I've missed something important! Or maybe I'm misunderstanding > what you mean. But anyway there is something to show. In my perspective, > we should construct a complete stack frame, otherwise one layer of stack > will be lost in calltrace when enable CONFIG_FRAME_POINTER. > > We can verify it by `echo 1 > > /sys/kernel/debug/tracing/options/stacktrace`, and the results as show > below: > > 1. complete stack frame > * -------------------------------------- > * FP + 8 [ RA of parent func ] return address of parent > * function > * FP + 0 [ FP ] > * > * FP - 8 [ RA of traced func ] return address of traced > * function > * FP - 16 [ FP ] > * -------------------------------------- > the stacktrace is: > > => trace_event_raw_event_bpf_trace_printk > => bpf_trace_printk > => bpf_prog_ad7f62a5e7675635_bpf_prog > => bpf_trampoline_6442536643 > => do_empty > => meminfo_proc_show > => seq_read_iter > => proc_reg_read_iter > => copy_splice_read > => vfs_splice_read > => splice_direct_to_actor > => do_splice_direct > => do_sendfile > => sys_sendfile64 > => do_trap_ecall_u > => ret_from_exception > > 2. omit one FP > * -------------------------------------- > * FP + 0 [ RA of parent func ] return address of parent > * function > * FP - 8 [ RA of traced func ] return address of traced > * function > * FP - 16 [ FP ] > * -------------------------------------- > the stacktrace is: > > => trace_event_raw_event_bpf_trace_printk > => bpf_trace_printk > => bpf_prog_ad7f62a5e7675635_bpf_prog > => bpf_trampoline_6442491529 > => do_empty > => seq_read_iter > => proc_reg_read_iter > => copy_splice_read > => vfs_splice_read > => splice_direct_to_actor > => do_splice_direct > => do_sendfile > => sys_sendfile64 > => do_trap_ecall_u > => ret_from_exception > > it lost the layer of 'meminfo_proc_show'. (Lehui was friendly enough to explain the details for me offlist.) Aha, now I get what you mean! When we're getting into the trampoline from the fentry-side, an additional stack frame needs to be created. Otherwise, the unwinding will be incorrect. So (for the rest of the readers ;-)), the BPF trampoline can be called from: A. A tracing point of view; Here, we're calling into the trampoline via the fentry/patchable entry. In this scenario, an additional stack frame needs to be constructed for proper unwinding. B. For kfuncs. Here, the call into the trampoline is just a "regular call", and no additional stack frame is needed. @Guo @Song Is the RISC-V ftrace code creating an additional stack frame, or is the stack unwinding incorrect when the fentry is patched? Thanks for clearing it up for me, Lehui! Björn