Re: [LSF/MM/BPF TOPIC] faster uprobes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

I did some basic experiment on bpftime, which combined user space
trampoline in bpftime with a bpf_prog_test_run syscall to run eBPF
code in kernel. In my laptop, it was about 2-3x faster than original
trap based Uprobe.

The experiment code was in
https://github.com/eunomia-bpf/bpftime/blob/71f13ae80e93e8ff45e1b0320c25ff14cb25b4ba/runtime/src/bpftime_prog.cpp#L113

(That's just a poc, not kernel patches)


On Fri, Mar 1, 2024 at 5:27 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Fri, Mar 1, 2024 at 9:01 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Fri, Mar 1, 2024 at 12:18 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > >
> > > On Thu, Feb 29, 2024 at 04:25:17PM -0800, Andrii Nakryiko wrote:
> > > > On Thu, Feb 29, 2024 at 6:39 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > > > >
> > > > > One of uprobe pain points is having slow execution that involves
> > > > > two traps in worst case scenario or single trap if the original
> > > > > instruction can be emulated. For return uprobes there's one extra
> > > > > trap on top of that.
> > > > >
> > > > > My current idea on how to make this faster is to follow the optimized
> > > > > kprobes and replace the normal uprobe trap instruction with jump to
> > > > > user space trampoline that:
> > > > >
> > > > >   - executes syscall to call uprobe consumers callbacks
> > > >
> > > > Did you get a chance to measure relative performance of syscall vs
> > > > int3 interrupt handling? If not, do you think you'll be able to get
> > > > some numbers by the time the conference starts? This should inform the
> > > > decision whether it even makes sense to go through all the trouble.
> > >
> > > right, will do that
> >
> > I believe Yusheng measured syscall vs uprobe performance
> > difference during LPC. iirc it was something like 3x.
>
> Do you have a link to slides? Was it actual uprobe vs just some fast
> syscall (not doing BPF program execution) comparison? Or comparing the
> performance of int3 handling vs equivalent syscall handling.
>
> I suspect it's the former, and so probably not that representative.
> I'm curious about the performance of going
> userspace->kernel->userspace through int3 vs syscall (all other things
> being equal).
>
> > Certainly necessary to have a benchmark.
> > selftests/bpf/bench has one for uprobe.
> > Probably should extend with sys_bpf.
> >
> > Regarding:
> > > replace the normal uprobe trap instruction with jump to
> > user space trampoline
> >
> > it should probably be a call to trampoline instead of a jump.
> > Unless you plan to generate a different trampoline for every location ?
> >
> > Also how would you pick a space for a trampoline in the target process ?
> > Analyze /proc/pid/maps and look for gaps in executable sections?
>
> kernel already does that for uretprobes, it adds a new "[uprobes]"
> memory mapping, so this part is already implemented
>
> >
> > We can start simple with a USDT that uses nop5 instead of nop1
> > and explicit single trampoline for all USDT locations
> > that saves all (callee and caller saved) registers and
> > then does sys_bpf with a new cmd.
> >
> > To replace nop5 with a call to trampoline we can use text_poke_bp
> > approach: replace 1st byte with int3, replace 2-5 with target addr,
> > replace 1st byte to make an actual call insn.
> >
> > Once patched there will be no simulation of insns or kernel traps.
> > Just normal user code that calls into trampoline, that calls sys_bpf,
> > and returns back.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux