On Mon, Oct 26, 2020 at 09:30:14PM -0700, Alexei Starovoitov wrote: > On Thu, Oct 22, 2020 at 10:42:05AM -0400, Steven Rostedt wrote: > > On Thu, 22 Oct 2020 16:11:54 +0200 > > Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > > > > I understand direct calls as a way that bpf trampolines and ftrace can > > > co-exist together - ebpf trampolines need that functionality of accessing > > > parameters of a function as if it was called directly and at the same > > > point we need to be able attach to any function and to as many functions > > > as we want in a fast way > > > > I was sold that bpf needed a quick and fast way to get the arguments of a > > function, as the only way to do that with ftrace is to save all registers, > > which, I was told was too much overhead, as if you only care about > > arguments, there's much less that is needed to save. > > > > Direct calls wasn't added so that bpf and ftrace could co-exist, it was > > that for certain cases, bpf wanted a faster way to access arguments, > > because it still worked with ftrace, but the saving of regs was too > > strenuous. > > Direct calls in ftrace were done so that ftrace and trampoline can co-exist. > There is no other use for it. > > Jiri, > could you please redo your benchmarking hardcoding ftrace_managed=false ? > If going through register_ftrace_direct() is indeed so much slower > than arch_text_poke() then something gotta give. > Either register_ftrace_direct() has to become faster or users > have to give up on co-existing of bpf and ftrace. > So far not a single user cared about using trampoline and ftrace together. > So the latter is certainly an option. I tried that, and IIRC it was not much faster, but I don't have details on that.. but it should be quick check, I'll do it anyway later I realized that for us we need ftrace to stay, so I abandoned this idea ;-) and started to check on how to keep them both together and just make it faster also currently bpf trampolines will not work without ftrace being enabled, because ftrace is doing the preparation work during compile, and replaces all the fentry calls with nop instructions and the replace code depends on those nops... so if we go this way, we would need to make this preparation code generic > > Regardless, the patch 7 (rbtree of kallsyms) is probably good on its own. > Can you benchmark it independently and maybe resubmit if it's useful > without other patches? yes, I'll submit that in separate patch thanks, jirka