Re: [LSF/MM/BPF TOPIC] faster uprobes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 1, 2024 at 12:18 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Thu, Feb 29, 2024 at 04:25:17PM -0800, Andrii Nakryiko wrote:
> > On Thu, Feb 29, 2024 at 6:39 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > >
> > > One of uprobe pain points is having slow execution that involves
> > > two traps in worst case scenario or single trap if the original
> > > instruction can be emulated. For return uprobes there's one extra
> > > trap on top of that.
> > >
> > > My current idea on how to make this faster is to follow the optimized
> > > kprobes and replace the normal uprobe trap instruction with jump to
> > > user space trampoline that:
> > >
> > >   - executes syscall to call uprobe consumers callbacks
> >
> > Did you get a chance to measure relative performance of syscall vs
> > int3 interrupt handling? If not, do you think you'll be able to get
> > some numbers by the time the conference starts? This should inform the
> > decision whether it even makes sense to go through all the trouble.
>
> right, will do that

I believe Yusheng measured syscall vs uprobe performance
difference during LPC. iirc it was something like 3x.
Certainly necessary to have a benchmark.
selftests/bpf/bench has one for uprobe.
Probably should extend with sys_bpf.

Regarding:
> replace the normal uprobe trap instruction with jump to
user space trampoline

it should probably be a call to trampoline instead of a jump.
Unless you plan to generate a different trampoline for every location ?

Also how would you pick a space for a trampoline in the target process ?
Analyze /proc/pid/maps and look for gaps in executable sections?

We can start simple with a USDT that uses nop5 instead of nop1
and explicit single trampoline for all USDT locations
that saves all (callee and caller saved) registers and
then does sys_bpf with a new cmd.

To replace nop5 with a call to trampoline we can use text_poke_bp
approach: replace 1st byte with int3, replace 2-5 with target addr,
replace 1st byte to make an actual call insn.

Once patched there will be no simulation of insns or kernel traps.
Just normal user code that calls into trampoline, that calls sys_bpf,
and returns back.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux