On Mon, 7 Aug 2023 22:48:29 +0200 Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > On Fri, Aug 04, 2023 at 12:42:06AM +0900, Masami Hiramatsu wrote: > > SNIP > > > > > > > On the other hand, untangling all code paths that come from > > > trampolines (with a light regs structure) from those that come from an > > > exception (with a pt_regs) could lead to a lot of duplicated code, and > > > converting between each subsystem's idea of a light regs structure > > > (what if perf introduces a perf_regs now ?) would be tedious and slow > > > (lots of copies ?). > > > > This is one discussion point I think. Actually, using pt_regs in kretprobe > > (and rethook) is histrical accident. Originally, it had put a kprobe on > > the function return trampoline to hook it. So keep the API compatiblity > > I made the hand assembled code to save the pt_regs on the stack. > > > > My another question is if we have the fprobe to trace (hook) the function > > return, why we still need the kretprobe itself. I think we can remove > > kretprobe and use fprobe exit handler, because "function" probing will > > be done by fprobe, not kprobe. And then, we can simplify the kprobe > > interface and clarify what it is -- "kprobe is a wrapper of software > > breakpoint". And we don't need to think about duplicated code anymore :) > > 1+ sounds like good idea Thanks! the downside will be that it requires to enable CONFIG_FPROBE instead of CONFIG_KPROBES, but I think it is natural that the user, who wants to trace function boundary, enables CONFIG_FUNCTION_TRACER. > > > > Otherwise, ftrace_regs() has support on arm64 for getting to the argument > > > > registers and the stack. Even live kernel patching now uses ftrace_regs(). > > > > > > > > > > > > > > If you guys decide to convert fprobe to ftrace_regs please > > > > > make it depend on kconfig or something. > > > > > bpf side needs full pt_regs. > > > > > > Some wild ideas that I brought up once in a BPF office hour: BPF > > > "multi_kprobe" could provide a fake pt_regs (either by constructing a > > > sparse one on the stack or by JIT-ing different offset accesses and/or > > > by having the verifier deny access to unpopulated fields) or break the > > > current API (is it conceivable to phase out BPF "multi_kprobe" > > > programs in favor of BPF "fprobe" programs that don't lie about their > > > API and guarantees and just provide a ftrace_regs ?) > > > > +1 :) > > so multi_kprobe link was created to allow fast attach of BPF kprobe-type > programs to multiple functions.. I don't think there's need for new fprobe > program Ah, OK. So the focus point is shortening registration time. > > > > > > > > > > Then use kprobes. When I asked Masami what the difference between fprobes > > > > and kprobes was, he told me that it would be that it would no longer rely > > > > on the slower FTRACE_WITH_REGS. But currently, it still does. > > > > > > Actually... Moving fprobe to ftrace_regs should get even more spicy! > > > :) Fprobe also wraps "rethook" which is basically the same thing as > > > kretprobe: a return trampoline that saves a pt_regs, to the point that > > > on x86 kretprobe's trampoline got dropped in favor of rethook's > > > trampoline. But for the same reasons that we don't want ftrace to save > > > pt_regs on arm64, rethook should probably also just save a ftrace_regs > > > ? (also, to keep the fprobe callback signatures consistent between > > > pre- and post- handlers). But if we want fprobe "post" callbacks to > > > save a ftrace_regs now, either we need to re-introduce the kretprobe > > > trampoline or also change the API of kretprobe (and break its symmetry > > > with kprobe and we'd have the same problem all over again with BPF > > > kretprobe program types...). All of this is "beautifully" entangled... > > > :) > > > > As I said, I would like to phase out the kretprobe itself because it > > provides the same feature of fprobe, which is confusing. jprobe was > > removed a while ago, and now kretprobe is. But we can not phase out > > it at once. So I think we will keep current kretprobe trampoline on > > arm64 and just add new ftrace_regs based rethook. Then remove the > > API next release. (after all users including systemtap is moved) > > > > > > > > > The reason I started the FTRACE_WITH_ARGS (which gave us ftrace_regs) in > > > > the first place, was because of the overhead you reported to me with > > > > ftrace_regs_caller and why you wanted to go the direct trampoline approach. > > > > That's when I realized I could use a subset because those registers were > > > > already being saved. The only reason FTRACE_WITH_REGS was created was it > > > > had to supply full pt_regs (including flags) and emulate a breakpoint for > > > > the kprobes interface. But in reality, nothing really needs all that. > > > > > > > > > It's not about access to args. > > > > > pt_regs is passed from bpf prog further into all kinds of perf event > > > > > functions including stack walking. > > > > > > If all accesses are done in BPF bytecode, we could (theoretically) > > > have the verifier and JIT work together to deny accesses to > > > unpopulated fields, or relocate pt_regs accesses to ftrace_regs > > > accesses to keep backward compatibility with existing multi_kprobe BPF > > > programs. > > > > Yeah, that is what I would like to suggest, and what my patch does. > > (let me update rethook too, it'll be a bit tricky since I don't want > > break anything) > > > > Thanks, > > > > > > > > Is there a risk that a "multi_kprobe" program could call into a BPF > > > helper or kfunc that reads this pt_regs pointer and expect certain > > > fields to be set ? I suppose we could also deny giving that "pt_regs" > > > pointer to a helper... :/ > > I think Alexei answered this earlier in the thread: > > >From bpf side we don't care that such pt_regs is 100% filled in or > >only partial as long as this pt_regs pointer is valid for perf_event_output > >and stack walking that consume pt_regs. > >I believe that was and still is the case for both x86 and arm64. OK, so I've made the ftrace_partial_regs() according to the idea now. Thanks, > > > jirka -- Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>