On 31/7/24 01:28, Andrii Nakryiko wrote: > On Mon, Jul 29, 2024 at 8:32 PM Leon Hwang <hffilwlqm@xxxxxxxxx> wrote: >> >> >> >> On 30/7/24 05:01, Andrii Nakryiko wrote: >>> On Fri, Jul 26, 2024 at 9:04 PM Leon Hwang <hffilwlqm@xxxxxxxxx> wrote: >>>> >>>> >>>> >>>> On 2024/7/27 08:12, Andrii Nakryiko wrote: >>>>> On Thu, Jul 25, 2024 at 7:57 PM Leon Hwang <hffilwlqm@xxxxxxxxx> wrote: >>>>>> >>>>>> >>>>>> >> >> [...] >> >>>>>> >>>>>> Is it OK to add a tracepoint here? I think tracepoint is more generic >>>>>> than retsnoop-like way. >>>>> >>>>> I personally don't see a problem with adding tracepoint, but how would >>>>> it look like, given we are talking about vararg printf-style function >>>>> calls? I'm not sure how that should be represented in such a way as to >>>>> make it compatible with tracepoints and not cause any runtime >>>>> overhead. >>>> >>>> The tracepoint is not about vararg printf-style function calls. >>>> >>>> It is to trace the reason why it fails to bpf_check_attach_target() at >>>> attach time. >>>> >>> >>> Oh, that changes things. I don't think we can keep adding extra >>> tracepoints for various potential reasons that BPF prog might be >>> failing to verify. >>> >>> But there is usually no need either. This particular code already >>> supports emitting extra information into verifier log, you just have >>> to provide that. This is done by libbpf automatically, can't your >>> library of choice do the same (if BPF program failed). >>> >>> Why go to all this trouble if we already have a facility to debug >>> issues like this. Note every issue is logged into verifier log, but in >>> this case it is. >>> >> >> Yeah, it is unnecessary to add tracepoint here, as we are able to trace >> the log message in bpf_log() arguments with retsnoop. > > My point was that you don't even need retsnoop, you can just ask for > verifier log directly, that's the main way to understand and debug BPF > program verification/load failures. > Nope. It is not about BPF program verification/load failures. It is about freplace program attach failures instead. As for freplace program, it can attach to a different target from the target at load time, since commit 4a1e7c0c63e0 ("bpf: Support attaching freplace programs to multiple attach points"). This patch tries to provide a better way to figure out the reason why a freplace program fails to attach. For example: tcp_connect.c: #include "vmlinux.h" #include <bpf/bpf_helpers.h> static __noinline int stub_handler_static(void) { bpf_printk("freplace, stub handler static\n"); return 0; } __noinline int stub_handler(void) { bpf_printk("freplace, stub handler\n"); return 0; } SEC("kprobe/tcp_connect") int k_tcp_connect(struct pt_regs *ctx) { stub_handler_static(); return stub_handler(); } char __license[] SEC("license") = "GPL"; freplace.c: #include "vmlinux.h" #include <bpf/bpf_helpers.h> SEC("freplace/stub_handler") int freplace_handler(void) { bpf_printk("freplace, replaced handler\n"); return 0; } char __license[] SEC("license") = "GPL"; And, here's pseudo C code snippet with libbpf to show attach failure: tcp_skel = tcp_connect__open_and_load(); prog_fd = tcp_skel->progs.k_tcp_connect; freplace_skel = freplace__open(); freplace_prog = freplace_skel->progs.freplace_handler; err = bpf_program__set_attach_target(freplace_prog, prog_fd, "stub_handler"); err = freplace__load(freplace_skel); freplace_link = bpf_program__attach_freplace(freplace_prog, prog_fd, "stub_handler_static"); freplace_link will be -EINVAL because stub_handler_static() is not a global function, as we have figured out. With this patch, "stub_handler_static() is not a global function" will be printed in dmesg. But we should not pollute dmesg with such message. If there's the tracepoint that I mentioned previously, "stub_handler_static() is not a global function" can be retrieved by the tracepoint. Thanks, Leon