On Tue, Mar 05, 2024 at 03:53:35PM -0800, Song Liu wrote: > On Tue, Mar 5, 2024 at 9:18 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > On Fri, Mar 01, 2024 at 11:39:03AM -0800, Kui-Feng Lee wrote: > > > > > > > > > > > > On 2/29/24 06:39, Jiri Olsa wrote: > > > > One of uprobe pain points is having slow execution that involves > > > > two traps in worst case scenario or single trap if the original > > > > instruction can be emulated. For return uprobes there's one extra > > > > trap on top of that. > > > > > > > > My current idea on how to make this faster is to follow the optimized > > > > kprobes and replace the normal uprobe trap instruction with jump to > > > > user space trampoline that: > > > > > > > > - executes syscall to call uprobe consumers callbacks > > > > - executes original instructions > > > > - jumps back to continue with the original code > > > > > > > > There are of course corner cases where above will have trouble or > > > > won't work completely, like: > > > > > > > > - executing original instructions in the trampoline is tricky wrt > > > > rip relative addressing > > > > > > > > - some instructions we can't move to trampoline at all > > > > > > > > - the uprobe address is on page boundary so the jump instruction to > > > > trampoline would span across 2 pages, hence the page replace won't > > > > be atomic, which might cause issues > > > > > > > > - ... ? many others I'm sure > > > > > > > > Still with all the limitations I think we could be able to speed up > > > > some amount of the uprobes, which seems worth doing. > > > > > > Just a random idea related to this. > > > Could we also run jit code of bpf programs in the user space to collect > > > information instead of going back to the kernel every time? > > I was thinking about a similar idea. I guess these user space BPF > programs will have limited features that we can probably use them > update bpf maps. For this limited scope, we still need bpf_arena. > Otherwise, the user space bpf program will need to update the bpf > maps with sys_bpf(), which adds the same overhead as triggering > the program with a syscall. > > > > > sorry for late reply, do you mean like ubpf? the scope of this change > > is to speed up the generic uprobe, ebpf is just one of the consumers > > I guess this means we need a new syscall? yes that's the idea, to replace the trap with syscall, so far I used light version of that for initial testing [1] jirka [1] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=uprobe_syscall_bench_1