Re: [LSF/MM/BPF TOPIC] faster uprobes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 1, 2024 at 9:01 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Mar 1, 2024 at 12:18 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> >
> > On Thu, Feb 29, 2024 at 04:25:17PM -0800, Andrii Nakryiko wrote:
> > > On Thu, Feb 29, 2024 at 6:39 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > > >
> > > > One of uprobe pain points is having slow execution that involves
> > > > two traps in worst case scenario or single trap if the original
> > > > instruction can be emulated. For return uprobes there's one extra
> > > > trap on top of that.
> > > >
> > > > My current idea on how to make this faster is to follow the optimized
> > > > kprobes and replace the normal uprobe trap instruction with jump to
> > > > user space trampoline that:
> > > >
> > > >   - executes syscall to call uprobe consumers callbacks
> > >
> > > Did you get a chance to measure relative performance of syscall vs
> > > int3 interrupt handling? If not, do you think you'll be able to get
> > > some numbers by the time the conference starts? This should inform the
> > > decision whether it even makes sense to go through all the trouble.
> >
> > right, will do that
>
> I believe Yusheng measured syscall vs uprobe performance
> difference during LPC. iirc it was something like 3x.

Do you have a link to slides? Was it actual uprobe vs just some fast
syscall (not doing BPF program execution) comparison? Or comparing the
performance of int3 handling vs equivalent syscall handling.

I suspect it's the former, and so probably not that representative.
I'm curious about the performance of going
userspace->kernel->userspace through int3 vs syscall (all other things
being equal).

> Certainly necessary to have a benchmark.
> selftests/bpf/bench has one for uprobe.
> Probably should extend with sys_bpf.
>
> Regarding:
> > replace the normal uprobe trap instruction with jump to
> user space trampoline
>
> it should probably be a call to trampoline instead of a jump.
> Unless you plan to generate a different trampoline for every location ?
>
> Also how would you pick a space for a trampoline in the target process ?
> Analyze /proc/pid/maps and look for gaps in executable sections?

kernel already does that for uretprobes, it adds a new "[uprobes]"
memory mapping, so this part is already implemented

>
> We can start simple with a USDT that uses nop5 instead of nop1
> and explicit single trampoline for all USDT locations
> that saves all (callee and caller saved) registers and
> then does sys_bpf with a new cmd.
>
> To replace nop5 with a call to trampoline we can use text_poke_bp
> approach: replace 1st byte with int3, replace 2-5 with target addr,
> replace 1st byte to make an actual call insn.
>
> Once patched there will be no simulation of insns or kernel traps.
> Just normal user code that calls into trampoline, that calls sys_bpf,
> and returns back.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux