Re: [LSF/MM/BPF TOPIC] faster uprobes

Jiri Olsa <olsajiri@xxxxxxxxx> · Thu, 7 Mar 2024 10:15:37 +0100



On Tue, Mar 05, 2024 at 03:53:35PM -0800, Song Liu wrote:
> On Tue, Mar 5, 2024 at 9:18 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> >
> > On Fri, Mar 01, 2024 at 11:39:03AM -0800, Kui-Feng Lee wrote:
> > >
> > >
> > >
> > > On 2/29/24 06:39, Jiri Olsa wrote:
> > > > One of uprobe pain points is having slow execution that involves
> > > > two traps in worst case scenario or single trap if the original
> > > > instruction can be emulated. For return uprobes there's one extra
> > > > trap on top of that.
> > > >
> > > > My current idea on how to make this faster is to follow the optimized
> > > > kprobes and replace the normal uprobe trap instruction with jump to
> > > > user space trampoline that:
> > > >
> > > >    - executes syscall to call uprobe consumers callbacks
> > > >    - executes original instructions
> > > >    - jumps back to continue with the original code
> > > >
> > > > There are of course corner cases where above will have trouble or
> > > > won't work completely, like:
> > > >
> > > >    - executing original instructions in the trampoline is tricky wrt
> > > >      rip relative addressing
> > > >
> > > >    - some instructions we can't move to trampoline at all
> > > >
> > > >    - the uprobe address is on page boundary so the jump instruction to
> > > >      trampoline would span across 2 pages, hence the page replace won't
> > > >      be atomic, which might cause issues
> > > >
> > > >    - ... ? many others I'm sure
> > > >
> > > > Still with all the limitations I think we could be able to speed up
> > > > some amount of the uprobes, which seems worth doing.
> > >
> > > Just a random idea related to this.
> > > Could we also run jit code of bpf programs in the user space to collect
> > > information instead of going back to the kernel every time?
> 
> I was thinking about a similar idea. I guess these user space BPF
> programs will have limited features that we can probably use them
> update bpf maps. For this limited scope, we still need bpf_arena.
> Otherwise, the user space bpf program will need to update the bpf
> maps with sys_bpf(), which adds the same overhead as triggering
> the program with a syscall.
> 
> >
> > sorry for late reply, do you mean like ubpf? the scope of this change
> > is to speed up the generic uprobe, ebpf is just one of the consumers
> 
> I guess this means we need a new syscall?

yes that's the idea, to replace the trap with syscall,
so far I used light version of that for initial testing [1]

jirka


[1] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=uprobe_syscall_bench_1