On Fri, Sep 13, 2024 at 6:50 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote: > > On Fri, 13 Sep 2024 21:45:15 +0900 > Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote: > > > On Fri, 13 Sep 2024 17:59:35 +0900 > > Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote: > > > > > > > > > > Taking failing output from the test: > > > > > > > > > > > kprobe_multi_testmod_check:FAIL:kretprobe_test3_result unexpected kretprobe_test3_result: actual 0 != expected 1 > > > > > > > > kretprobe_test3_result is a sort of identifier for a test condition, > > > > you can find a corresponding line in user space .c file grepping for > > > > that: > > > > > > > > ASSERT_EQ(skel->bss->kretprobe_testmod_test3_result, 1, > > > > "kretprobe_test3_result"); > > > > > > > > Most probably the problem is in: > > > > > > > > __u64 addr = bpf_get_func_ip(ctx); > > > > > > Yeah, and as I replyed to another thread, the problem is > > > that the ftrace entry_ip is not symbol ip. > > > > > > We have ftrace_call_adjust() arch function for reverse > > > direction (symbol ip to ftrace entry ip) but what we need > > > here is the reverse translate function (ftrace entry to symbol) > > > > > > The easiest way is to use kallsyms to find it, but this is > > > a bit costful (but it just increase bsearch in several levels). > > > Other possible options are > > > > > > - Change bpf_kprobe_multi_addrs_cmp() to accept a range > > > of address. [sym_addr, sym_addr + offset) returns true, > > > bpf_kprobe_multi_cookie() can find the entry address. > > > The offset depends on arch, but 16 is enough. > > > > Oops. no, this bpf_kprobe_multi_cookie() is used only for storing > > test data. Not a general problem solving. (I saw kprobe_multi_check()) > > > > So solving problem is much costly, we need to put more arch- > > dependent in bpf_trace, and make sure it does reverse translation > > of ftrace_call_adjust(). (this means if you expand arch support, > > you need to add such implementation) > > OK, can you try this one? > I'm running out of time today, so I won't have time to try this, sorry. But see my last reply, I think adjusting link->addrs once before attachment is the way to go. It gives us fast and simple lookups at runtime. In my last reply I assumed that we won't need to keep a copy of original addrs (because we can dynamically adjust for bpf_kprobe_multi_link_fill_link_info()), but I now realize that we do need that for bpf_get_func_ip() anyways. Still, I'd rather have an extra link->adj_addrs with a copy and do a quick and simple lookup at runtime. So I suggest going with that. At the very worst case it's a few kilobytes of memory for thousands of attached functions, no big deal, IMO. It's vastly better than maintaining arch-specific reverse of ftrace_call_adjust(), isn't it? Jiri, any opinion here? > > From 81bc599911507215aa9faa1077a601880cbd654a Mon Sep 17 00:00:00 2001 > From: "Masami Hiramatsu (Google)" <mhiramat@xxxxxxxxxx> > Date: Fri, 13 Sep 2024 21:43:46 +0900 > Subject: [PATCH] bpf: Add get_entry_ip() for arm64 > > Add get_entry_ip() implementation for arm64. This is based on > the information in ftrace_call_adjust() for arm64. > > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> > --- > kernel/trace/bpf_trace.c | 64 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 64 insertions(+) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index deb629f4a510..b0cf6e5b8965 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -1066,6 +1066,70 @@ static unsigned long get_entry_ip(unsigned long fentry_ip) > fentry_ip -= ENDBR_INSN_SIZE; > return fentry_ip; > } > +#elif defined (CONFIG_ARM64) > +#include <asm/insn.h> > + > +static unsigned long get_entry_ip(unsigned long fentry_ip) > +{ > + u32 insn; > + > + /* > + * When using patchable-function-entry without pre-function NOPS, ftrace > + * entry is the address of the first NOP after the function entry point. > + * > + * The compiler has either generated: > + * > + * func+00: func: NOP // To be patched to MOV X9, LR > + * func+04: NOP // To be patched to BL <caller> > + * > + * Or: > + * > + * func-04: BTI C > + * func+00: func: NOP // To be patched to MOV X9, LR > + * func+04: NOP // To be patched to BL <caller> > + * > + * The fentry_ip is the address of `BL <caller>` which is at `func + 4` > + * bytes in either case. > + */ > + if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) > + return fentry_ip - AARCH64_INSN_SIZE; > + > + /* > + * When using patchable-function-entry with pre-function NOPs, BTI is > + * a bit different. > + * > + * func+00: func: NOP // To be patched to MOV X9, LR > + * func+04: NOP // To be patched to BL <caller> > + * > + * Or: > + * > + * func+00: func: BTI C > + * func+04: NOP // To be patched to MOV X9, LR > + * func+08: NOP // To be patched to BL <caller> > + * > + * The fentry_ip is the address of `BL <caller>` which is at either > + * `func + 4` or `func + 8` depends on whether there is a BTI. > + */ > + > + /* If there is no BTI, the func address should be one instruction before. */ > + if (!IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) > + return fentry_ip - AARCH64_INSN_SIZE; > + > + /* We want to be extra safe in case entry ip is on the page edge, > + * but otherwise we need to avoid get_kernel_nofault()'s overhead. > + */ > + if ((fentry_ip & ~PAGE_MASK) < AARCH64_INSN_SIZE * 2) { > + if (get_kernel_nofault(insn, (u32 *)(fentry_ip - AARCH64_INSN_SIZE * 2))) > + return fentry_ip - AARCH64_INSN_SIZE; > + } else { > + insn = *(u32 *)(fentry_ip - AARCH64_INSN_SIZE * 2); > + } > + > + if (aarch64_insn_is_bti(le32_to_cpu((__le32)insn))) > + return fentry_ip - AARCH64_INSN_SIZE * 2; > + > + return fentry_ip - AARCH64_INSN_SIZE; > +} > #else > #define get_entry_ip(fentry_ip) fentry_ip > #endif > -- > 2.34.1 > > > > -- > Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>