Hi Vishal,
Vishal Chourasia wrote:
On Fri, Jun 21, 2024 at 12:24:03AM +0530, Naveen N Rao wrote:
This is v3 of the patches posted here:
http://lkml.kernel.org/r/cover.1718008093.git.naveen@xxxxxxxxxx
Since v2, I have addressed review comments from Steven and Masahiro
along with a few fixes. Patches 7-11 are new in this series and add
support for ftrace direct and bpf trampolines.
This series depends on the patch series from Benjamin Gray adding
support for patch_ulong():
http://lkml.kernel.org/r/20240515024445.236364-1-bgray@xxxxxxxxxxxxx
- Naveen
Hello Naveen,
I've noticed an issue with `kstack()` in bpftrace [1] when using `kfunc`
compared to `kprobe`. Despite trying all three modes specified in the
documentation (bpftrace, perf, or raw), the stack isn't unwinding
properly with `kfunc`.
[1] https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#kstack
for mode in modes; do
run bpftrace with kfunc
disable cpu
kill bpftrace
run bpftrace with kprobe
enable cpu
kill bpftrace
# ./kprobe_vs_kfunc.sh
+ bpftrace -e 'kfunc:vmlinux:build_sched_domains {@[kstack(bpftrace), comm, tid]=count();}'
Attaching 1 probe...
+ chcpu -d 2-3
CPU 2 disabled
CPU 3 disabled
+ kill 35214
@[
bpf_prog_cfd8d6c8bb4898ce+972
, cpuhp/2, 33]: 1
@[
bpf_prog_cfd8d6c8bb4898ce+972
, cpuhp/3, 38]: 1
Yeah, this is because we don't capture the full register state with bpf
trampolines unlike with kprobes. BPF stackmap relies on
perf_arch_fetch_caller_regs() to create a dummy pt_regs for use by
get_perf_callchain(). We end up with a NULL LR, and bpftrace (and most
other userspace tools) stop showing the backtrace when they encounter a
NULL entry. I recall fixing some tools to continue to show backtrace
inspite of a NULL entry, but I may be mis-remembering.
Perhaps we should fix/change how the perf callchain is captured in the
kernel. We filter out invalid entries, and capture an additional entry
for perf since we can't be sure of our return address. We should revisit
this and see if we can align with the usual expectations of a callchain
not having a NULL entry. Something like this may help, but this needs
more testing especially on the perf side:
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 6b4434dd0ff3..9f67b764da92 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -83,12 +83,12 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
* We can't tell which of the first two addresses
* we get are valid, but we can filter out the
* obviously bogus ones here. We replace them
- * with 0 rather than removing them entirely so
+ * with -1 rather than removing them entirely so
* that userspace can tell which is which.
*/
if ((level == 1 && next_ip == lr) ||
(level <= 1 && !kernel_text_address(next_ip)))
- next_ip = 0;
+ next_ip = -1;
++level;
}
- Naveen