On Wed, Nov 24, 2021 at 01:43:22PM -0800, Andrii Nakryiko wrote: > On Thu, Nov 18, 2021 at 3:25 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > > > Adding bpf_arg/bpf_ret_value helpers for tracing programs > > that returns traced function arguments. > > > > Get n-th argument of the traced function: > > long bpf_arg(void *ctx, int n) > > > > Get return value of the traced function: > > long bpf_ret_value(void *ctx) > > > > The trampoline now stores number of arguments on ctx-8 > > address, so it's easy to verify argument index and find > > return value argument. > > > > Moving function ip address on the trampoline stack behind > > the number of functions arguments, so it's now stored > > on ctx-16 address. > > > > Both helpers are inlined by verifier. > > > > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> > > --- > > It would be great to land these changes separate from your huge patch > set. There are some upcoming BPF trampoline related changes that will > touch this (to add BPF cookie support for fentry/fexit progs), so > would be nice to minimize the interdependencies. So maybe post this > patch separately (probably after holidays ;) ). ok > > > arch/x86/net/bpf_jit_comp.c | 18 +++++++++++--- > > include/uapi/linux/bpf.h | 14 +++++++++++ > > kernel/bpf/verifier.c | 45 ++++++++++++++++++++++++++++++++-- > > kernel/trace/bpf_trace.c | 38 +++++++++++++++++++++++++++- > > tools/include/uapi/linux/bpf.h | 14 +++++++++++ > > 5 files changed, 122 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > > index 631847907786..67e8ac9aaf0d 100644 > > --- a/arch/x86/net/bpf_jit_comp.c > > +++ b/arch/x86/net/bpf_jit_comp.c > > @@ -1941,7 +1941,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i > > void *orig_call) > > { > > int ret, i, nr_args = m->nr_args; > > - int stack_size = nr_args * 8; > > + int stack_size = nr_args * 8 + 8 /* nr_args */; > > this /* nr_args */ next to 8 is super confusing, would be better to > expand the comment; might be a good idea to have some sort of a > description of possible stack layouts (e.g., fexit has some extra > stuff on the stack, I think, but it's impossible to remember and need > to recover that knowledge from the assembly code, basically). > > > struct bpf_tramp_progs *fentry = &tprogs[BPF_TRAMP_FENTRY]; > > struct bpf_tramp_progs *fexit = &tprogs[BPF_TRAMP_FEXIT]; > > struct bpf_tramp_progs *fmod_ret = &tprogs[BPF_TRAMP_MODIFY_RETURN]; > > @@ -1987,12 +1987,22 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i > > EMIT4(0x48, 0x83, 0xe8, X86_PATCH_SIZE); > > emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -stack_size); > > > > - /* Continue with stack_size for regs storage, stack will > > - * be correctly restored with 'leave' instruction. > > - */ > > + /* Continue with stack_size for 'nr_args' storage */ > > same, I don't think this comment really helps, just confuses some more ok, I'll put some more comments with list of possible the stack layouts > > > stack_size -= 8; > > } > > > > + /* Store number of arguments of the traced function: > > + * mov rax, nr_args > > + * mov QWORD PTR [rbp - stack_size], rax > > + */ > > + emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_args); > > + emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -stack_size); > > + > > + /* Continue with stack_size for regs storage, stack will > > + * be correctly restored with 'leave' instruction. > > + */ > > + stack_size -= 8; > > I think "stack_size" as a name outlived itself and it just makes > everything harder to understand. It's used more like a stack offset > (relative to rsp or rbp) for different things. Would it make code > worse if we had few offset variables instead (or rather in addition, > we still need to calculate a full stack_size; it's just it's constant > re-adjustment is what's hard to keep track of), like regs_off, > ret_ip_off, arg_cnt_off, etc? let's see, I'll try that > > > + > > save_regs(m, &prog, nr_args, stack_size); > > > > if (flags & BPF_TRAMP_F_CALL_ORIG) { > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index a69e4b04ffeb..fc8b344eecba 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -4957,6 +4957,18 @@ union bpf_attr { > > * **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*. > > * **-EBUSY** if failed to try lock mmap_lock. > > * **-EINVAL** for invalid **flags**. > > + * > > + * long bpf_arg(void *ctx, int n) > > __u32 n ? ok > > > + * Description > > + * Get n-th argument of the traced function (for tracing programs). > > + * Return > > + * Value of the argument. > > What about errors? those need to be documented. ok > > > + * > > + * long bpf_ret_value(void *ctx) > > + * Description > > + * Get return value of the traced function (for tracing programs). > > + * Return > > + * Return value of the traced function. > > Same, errors not documented. Also would be good to document what > happens when ret_value is requested in the context where there is no > ret value (e.g., fentry) ugh, that's not handled at the moment.. should we fail when we see bpf_ret_value helper call in fentry program? > > > */ > > #define __BPF_FUNC_MAPPER(FN) \ > > FN(unspec), \ > > @@ -5140,6 +5152,8 @@ union bpf_attr { > > FN(skc_to_unix_sock), \ > > FN(kallsyms_lookup_name), \ > > FN(find_vma), \ > > + FN(arg), \ > > + FN(ret_value), \ > > We already have bpf_get_func_ip, so why not continue a tradition and > call these bpf_get_func_arg() and bpf_get_func_ret(). Nice, short, > clean, consistent. ok > > BTW, a wild thought. Wouldn't it be cool to have these functions work > with kprobe/kretprobe as well? Do you think it's possible? right, bpf_get_func_ip already works for kprobes struct kprobe could have the btf_func_model of the traced function, so in case we trace function directly on the entry point we could read arguments registers based on the btf_func_model I'll check with Massami > > > /* */ > > > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index fac0c3518add..d4249ef6ca7e 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -13246,11 +13246,52 @@ static int do_misc_fixups(struct bpf_verifier_env *env) > > continue; > > } > > > > + /* Implement bpf_arg inline. */ > > + if (prog_type == BPF_PROG_TYPE_TRACING && > > + insn->imm == BPF_FUNC_arg) { > > + /* Load nr_args from ctx - 8 */ > > + insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8); > > + insn_buf[1] = BPF_JMP32_REG(BPF_JGE, BPF_REG_2, BPF_REG_0, 4); > > + insn_buf[2] = BPF_ALU64_IMM(BPF_MUL, BPF_REG_2, 8); > > + insn_buf[3] = BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1); > > + insn_buf[4] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0); > > + insn_buf[5] = BPF_JMP_A(1); > > + insn_buf[6] = BPF_MOV64_IMM(BPF_REG_0, 0); > > + > > + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 7); > > + if (!new_prog) > > + return -ENOMEM; > > + > > + delta += 6; > > + env->prog = prog = new_prog; > > + insn = new_prog->insnsi + i + delta; > > + continue; > > nit: this whole sequence of steps and calculations seems like > something that might be abstracted and hidden behind a macro or helper > func? Not related to your change, though. But wouldn't it be easier to > understand if it was just written as: > > PATCH_INSNS( > BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8); > BPF_JMP32_REG(BPF_JGE, BPF_REG_2, BPF_REG_0, 4); > BPF_ALU64_IMM(BPF_MUL, BPF_REG_2, 8); > BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1); > BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0); > BPF_JMP_A(1); > BPF_MOV64_IMM(BPF_REG_0, 0)); > continue; yep, looks better ;-) I'll check > > ? > > > > + } > > + > > + /* Implement bpf_ret_value inline. */ > > + if (prog_type == BPF_PROG_TYPE_TRACING && > > + insn->imm == BPF_FUNC_ret_value) { > > + /* Load nr_args from ctx - 8 */ > > + insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -8); > > + insn_buf[1] = BPF_ALU64_IMM(BPF_MUL, BPF_REG_2, 8); > > + insn_buf[2] = BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1); > > + insn_buf[3] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0); > > + > > + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 4); > > + if (!new_prog) > > + return -ENOMEM; > > + > > + delta += 3; > > + env->prog = prog = new_prog; > > + insn = new_prog->insnsi + i + delta; > > + continue; > > + } > > + > > /* Implement bpf_get_func_ip inline. */ > > if (prog_type == BPF_PROG_TYPE_TRACING && > > insn->imm == BPF_FUNC_get_func_ip) { > > - /* Load IP address from ctx - 8 */ > > - insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8); > > + /* Load IP address from ctx - 16 */ > > + insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -16); > > > > new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 1); > > if (!new_prog) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index 25ea521fb8f1..3844cfb45490 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -1012,7 +1012,7 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = { > > BPF_CALL_1(bpf_get_func_ip_tracing, void *, ctx) > > { > > /* This helper call is inlined by verifier. */ > > - return ((u64 *)ctx)[-1]; > > + return ((u64 *)ctx)[-2]; > > } > > > > static const struct bpf_func_proto bpf_get_func_ip_proto_tracing = { > > @@ -1091,6 +1091,38 @@ static const struct bpf_func_proto bpf_get_branch_snapshot_proto = { > > .arg2_type = ARG_CONST_SIZE_OR_ZERO, > > }; > > > > +BPF_CALL_2(bpf_arg, void *, ctx, int, n) > > +{ > > + /* This helper call is inlined by verifier. */ > > + u64 nr_args = ((u64 *)ctx)[-1]; > > + > > + if ((u64) n >= nr_args) > > + return 0; > > We'll need bpf_get_func_arg_cnt() helper as well to be able to know > the actual number of arguments traced function has. It's impossible to > know whether the argument is zero or there is no argument, otherwise. my idea was that the program will call those helpers based on get_func_ip with proper argument indexes but with bpf_get_func_arg_cnt we could make a simple program that would just print function with all its arguments easily, ok ;-) > > > + return ((u64 *)ctx)[n]; > > +} > > + > > +static const struct bpf_func_proto bpf_arg_proto = { > > + .func = bpf_arg, > > + .gpl_only = true, > > + .ret_type = RET_INTEGER, > > + .arg1_type = ARG_PTR_TO_CTX, > > + .arg1_type = ARG_ANYTHING, > > +}; > > + > > +BPF_CALL_1(bpf_ret_value, void *, ctx) > > +{ > > + /* This helper call is inlined by verifier. */ > > + u64 nr_args = ((u64 *)ctx)[-1]; > > + > > + return ((u64 *)ctx)[nr_args]; > > we should return 0 for fentry or disable this helper for anything but > fexit? It's going to return garbage otherwise. disabling seems like right choice to me thanks, jirka