On Sat, Mar 19, 2022 at 01:27:56PM +0100, Jiri Olsa wrote: > On Fri, Mar 18, 2022 at 10:50:46PM -0700, Andrii Nakryiko wrote: > > On Wed, Mar 16, 2022 at 5:24 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > > > > > hi, > > > this patchset adds new link type BPF_TRACE_KPROBE_MULTI that attaches > > > kprobe program through fprobe API [1] instroduced by Masami. > > > > > > The fprobe API allows to attach probe on multiple functions at once very > > > fast, because it works on top of ftrace. On the other hand this limits > > > the probe point to the function entry or return. > > > > > > > > > With bpftrace support I see following attach speed: > > > > > > # perf stat --null -r 5 ./src/bpftrace -e 'kprobe:x* { } i:ms:1 { exit(); } ' > > > Attaching 2 probes... > > > Attaching 3342 functions > > > ... > > > > > > 1.4960 +- 0.0285 seconds time elapsed ( +- 1.91% ) > > > > > > v3 changes: > > > - based on latest fprobe post from Masami [2] > > > - add acks > > > - add extra comment to kprobe_multi_link_handler wrt entry ip setup [Masami] > > > - keep swap_words_64 static and swap values directly in > > > bpf_kprobe_multi_cookie_swap [Andrii] > > > - rearrange locking/migrate setup in kprobe_multi_link_prog_run [Andrii] > > > - move uapi fields [Andrii] > > > - add bpf_program__attach_kprobe_multi_opts function [Andrii] > > > - many small test changes [Andrii] > > > - added tests for bpf_program__attach_kprobe_multi_opts > > > - make kallsyms_lookup_name check for empty string [Andrii] > > > > > > v2 changes: > > > - based on latest fprobe changes [1] > > > - renaming the uapi interface to kprobe multi > > > - adding support for sort_r to pass user pointer for swap functions > > > and using that in cookie support to keep just single functions array > > > - moving new link to kernel/trace/bpf_trace.c file > > > - using single fprobe callback function for entry and exit > > > - using kvzalloc, libbpf_ensure_mem functions > > > - adding new k[ret]probe.multi sections instead of using current kprobe > > > - used glob_match from test_progs.c, added '?' matching > > > - move bpf_get_func_ip verifier inline change to seprate change > > > - couple of other minor fixes > > > > > > > > > Also available at: > > > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git > > > bpf/kprobe_multi > > > > > > thanks, > > > jirka > > > > > > > > > [1] https://lore.kernel.org/bpf/164458044634.586276.3261555265565111183.stgit@devnote2/ > > > [2] https://lore.kernel.org/bpf/164735281449.1084943.12438881786173547153.stgit@devnote2/ > > > --- > > > Jiri Olsa (13): > > > lib/sort: Add priv pointer to swap function > > > kallsyms: Skip the name search for empty string > > > bpf: Add multi kprobe link > > > bpf: Add bpf_get_func_ip kprobe helper for multi kprobe link > > > bpf: Add support to inline bpf_get_func_ip helper on x86 > > > bpf: Add cookie support to programs attached with kprobe multi link > > > libbpf: Add libbpf_kallsyms_parse function > > > libbpf: Add bpf_link_create support for multi kprobes > > > libbpf: Add bpf_program__attach_kprobe_multi_opts function > > > selftests/bpf: Add kprobe_multi attach test > > > selftests/bpf: Add kprobe_multi bpf_cookie test > > > selftests/bpf: Add attach test for bpf_program__attach_kprobe_multi_opts > > > selftests/bpf: Add cookie test for bpf_program__attach_kprobe_multi_opts > > > > > > > Ok, so I've integrated multi-attach kprobes into retsnoop. It was > > pretty straightforward. Here are some numbers for the speed of > > attaching and, even more importantly, detaching for a set of almost > > 400 functions. It's a bit less functions for fentry-based mode due to > > more limited BTF information for static functions. Note that retsnoop > > attaches two BPF programs for each kernel function, so it's actually > > two multi-attach kprobes, each attaching to 397 functions. For > > single-attach kprobe, we perform 397 * 2 = 794 attachments. > > > > I've been invoking retsnoop with the following specified set of > > functions: -e '*sys_bpf*' -a ':kernel/bpf/syscall.c' -a > > ':kernel/bpf/verifier.c' -a ':kernel/bpf/btf.c' -a > > ':kernel/bpf/core.c'. So basically bpf syscall entry functions and all > > the discoverable functions from syscall.c, verifier.c, core.c and > > btf.c from kernel/bpf subdirectory. > > > > Results: > > > > fentry attach/detach (263 kfuncs): 2133 ms / 31671 ms (33 seconds) > > kprobe attach/detach (397 kfuncs): 3121 ms / 13195 ms (16 seconds) > > multi-kprobe attach/detach (397 kfuncs): 9 ms / 248 ms (0.25 seconds) > > > > So as you can see, the speed up is tremendous! API is also very > > convenient, I didn't have to modify retsnoop internals much to > > accommodate multi-attach API. Great job! > > nice! thanks for doing that so quickly > > > > > Now for the bad news. :( > > > > Stack traces from multi-attach kretprobe are broken, which makes all > > this way less useful. > > > > Below, see stack traces captured with multi- and single- kretprobes > > for two different use cases. Single kprobe stack traces make much more > > sense. Ignore that last function doesn't have actual address > > associated with it (i.e. for __sys_bpf and bpf_tracing_prog_attach, > > respectively). That's just how retsnoop is doing things, I think. We > > actually were capturing stack traces from inside __sys_bpf (first > > case) and bpf_tracing_prog_attach (second case). > > > > MULTI KPROBE: > > ffffffff81185a80 __sys_bpf+0x0 > > (kernel/bpf/syscall.c:4622:1) > > > > SINGLE KPROBE: > > ffffffff81e0007c entry_SYSCALL_64_after_hwframe+0x44 > > (arch/x86/entry/entry_64.S:113:0) > > ffffffff81cd2b15 do_syscall_64+0x35 > > (arch/x86/entry/common.c:80:7) > > . do_syscall_x64 > > (arch/x86/entry/common.c:50:12) > > ffffffff811881aa __x64_sys_bpf+0x1a > > (kernel/bpf/syscall.c:4765:1) > > __sys_bpf > > > > > > MULTI KPROBE: > > ffffffff811851b0 bpf_tracing_prog_attach+0x0 > > (kernel/bpf/syscall.c:2708:1) > > > > SINGLE KPROBE: > > ffffffff81e0007c entry_SYSCALL_64_after_hwframe+0x44 > > (arch/x86/entry/entry_64.S:113:0) > > ffffffff81cd2b15 do_syscall_64+0x35 > > (arch/x86/entry/common.c:80:7) > > . do_syscall_x64 > > (arch/x86/entry/common.c:50:12) > > ffffffff811881aa __x64_sys_bpf+0x1a > > (kernel/bpf/syscall.c:4765:1) > > ffffffff81185e79 __sys_bpf+0x3f9 > > (kernel/bpf/syscall.c:4705:9) > > ffffffff8118583a bpf_raw_tracepoint_open+0x19a > > (kernel/bpf/syscall.c:3069:6) > > bpf_tracing_prog_attach > > > > You can see that in multi-attach kprobe we only get one entry, which > > is the very last function in the stack trace. We have no parent > > function captured whatsoever. Jiri, Masami, any ideas what's wrong and > > how to fix this? Let's try to figure this out and fix it before the > > feature makes it into the kernel release. Thanks in advance! > > oops, I should have tried kstack with the bpftrace's kretprobe, I see the same: > > # ./src/bpftrace -e 'kretprobe:x* { @[kstack] = count(); }' > Attaching 1 probe... > Attaching 3340probes > ^C > > @[ > xfs_trans_apply_dquot_deltas+0 > ]: 22 > @[ > xlog_cil_commit+0 > ]: 22 > @[ > xlog_grant_push_threshold+0 > ]: 22 > @[ > xfs_trans_add_item+0 > ]: 22 > @[ > xfs_log_reserve+0 > ]: 22 > @[ > xlog_space_left+0 > ]: 22 > @[ > xfs_buf_offset+0 > ]: 22 > @[ > xfs_trans_free_dqinfo+0 > ]: 22 > @[ > xlog_ticket_alloc+0 > xfs_log_reserve+5 > ]: 22 > @[ > xfs_cil_prepare_item+0 > > > I think it's because we set original ip for return probe to have > bpf_get_func_ip working properly, but it breaks backtrace of course > > I'm not sure we could bring along the original regs for return probe, > but I have an idea how to workaround the bpf_get_func_ip issue and > keep the registers intact for other helpers change below is using bpf_run_ctx to store link and entry ip on stack, where helpers can find it.. it fixed the retprobe backtrace for me I had to revert the get_func_ip inline.. it's squashed in the change below for quick testing.. I'll send revert in separate patch with the formal change could you please test? thanks, jirka --- diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0287176bfe9a..cf92f9c01556 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -13678,7 +13678,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env) continue; } - /* Implement tracing bpf_get_func_ip inline. */ + /* Implement bpf_get_func_ip inline. */ if (prog_type == BPF_PROG_TYPE_TRACING && insn->imm == BPF_FUNC_get_func_ip) { /* Load IP address from ctx - 16 */ @@ -13693,25 +13693,6 @@ static int do_misc_fixups(struct bpf_verifier_env *env) continue; } -#ifdef CONFIG_X86 - /* Implement kprobe_multi bpf_get_func_ip inline. */ - if (prog_type == BPF_PROG_TYPE_KPROBE && - eatype == BPF_TRACE_KPROBE_MULTI && - insn->imm == BPF_FUNC_get_func_ip) { - /* Load IP address from ctx (struct pt_regs) ip */ - insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, - offsetof(struct pt_regs, ip)); - - new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 1); - if (!new_prog) - return -ENOMEM; - - env->prog = prog = new_prog; - insn = new_prog->insnsi + i + delta; - continue; - } -#endif - patch_call_imm: fn = env->ops->get_func_proto(insn->imm, env->prog); /* all functions that have prototype and verifier allowed diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 9a7b6be655e4..aefe33c08296 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -80,7 +80,8 @@ u64 bpf_get_stack(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); static int bpf_btf_printf_prepare(struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags, const struct btf **btf, s32 *btf_id); -static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx, u64 ip); +static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx); +static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx); /** * trace_call_bpf - invoke BPF program @@ -1042,8 +1043,7 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = { BPF_CALL_1(bpf_get_func_ip_kprobe_multi, struct pt_regs *, regs) { - /* This helper call is inlined by verifier on x86. */ - return instruction_pointer(regs); + return bpf_kprobe_multi_entry_ip(current->bpf_ctx); } static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe_multi = { @@ -1055,7 +1055,7 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe_multi = { BPF_CALL_1(bpf_get_attach_cookie_kprobe_multi, struct pt_regs *, regs) { - return bpf_kprobe_multi_cookie(current->bpf_ctx, instruction_pointer(regs)); + return bpf_kprobe_multi_cookie(current->bpf_ctx); } static const struct bpf_func_proto bpf_get_attach_cookie_proto_kmulti = { @@ -2220,15 +2220,16 @@ struct bpf_kprobe_multi_link { struct bpf_link link; struct fprobe fp; unsigned long *addrs; - /* - * The run_ctx here is used to get struct bpf_kprobe_multi_link in - * get_attach_cookie helper, so it can't be used to store data. - */ - struct bpf_run_ctx run_ctx; u64 *cookies; u32 cnt; }; +struct bpf_kprobe_multi_run_ctx { + struct bpf_run_ctx run_ctx; + struct bpf_kprobe_multi_link *link; + unsigned long entry_ip; +}; + static void bpf_kprobe_multi_link_release(struct bpf_link *link) { struct bpf_kprobe_multi_link *kmulti_link; @@ -2282,18 +2283,21 @@ static int bpf_kprobe_multi_cookie_cmp(const void *a, const void *b, const void return __bpf_kprobe_multi_cookie_cmp(a, b); } -static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx, u64 ip) +static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx) { + struct bpf_kprobe_multi_run_ctx *run_ctx; struct bpf_kprobe_multi_link *link; + u64 *cookie, entry_ip; unsigned long *addr; - u64 *cookie; if (WARN_ON_ONCE(!ctx)) return 0; - link = container_of(ctx, struct bpf_kprobe_multi_link, run_ctx); + run_ctx = container_of(current->bpf_ctx, struct bpf_kprobe_multi_run_ctx, run_ctx); + link = run_ctx->link; + entry_ip = run_ctx->entry_ip; if (!link->cookies) return 0; - addr = bsearch(&ip, link->addrs, link->cnt, sizeof(ip), + addr = bsearch(&entry_ip, link->addrs, link->cnt, sizeof(entry_ip), __bpf_kprobe_multi_cookie_cmp); if (!addr) return 0; @@ -2301,10 +2305,22 @@ static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx, u64 ip) return *cookie; } +static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx) +{ + struct bpf_kprobe_multi_run_ctx *run_ctx; + + run_ctx = container_of(current->bpf_ctx, struct bpf_kprobe_multi_run_ctx, run_ctx); + return run_ctx->entry_ip; +} + static int kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, - struct pt_regs *regs) + unsigned long entry_ip, struct pt_regs *regs) { + struct bpf_kprobe_multi_run_ctx run_ctx = { + .link = link, + .entry_ip = entry_ip, + }; struct bpf_run_ctx *old_run_ctx; int err; @@ -2315,7 +2331,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, migrate_disable(); rcu_read_lock(); - old_run_ctx = bpf_set_run_ctx(&link->run_ctx); + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); err = bpf_prog_run(link->link.prog, regs); bpf_reset_run_ctx(old_run_ctx); rcu_read_unlock(); @@ -2330,24 +2346,10 @@ static void kprobe_multi_link_handler(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs) { - unsigned long saved_ip = instruction_pointer(regs); struct bpf_kprobe_multi_link *link; - /* - * Because fprobe's regs->ip is set to the next instruction of - * dynamic-ftrace instruction, correct entry ip must be set, so - * that the bpf program can access entry address via regs as same - * as kprobes. - * - * Both kprobe and kretprobe see the entry ip of traced function - * as instruction pointer. - */ - instruction_pointer_set(regs, entry_ip); - link = container_of(fp, struct bpf_kprobe_multi_link, fp); - kprobe_multi_link_prog_run(link, regs); - - instruction_pointer_set(regs, saved_ip); + kprobe_multi_link_prog_run(link, entry_ip, regs); } static int @@ -2514,7 +2516,11 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr { return -EOPNOTSUPP; } -static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx, u64 ip) +static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx) +{ + return 0; +} +static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx) { return 0; }