On Tue, Nov 05, 2024 at 02:33:54PM +0100, Jiri Olsa wrote: > hi, > this patchset adds support to optimize usdt probes on top of 5-byte > nop instruction. > > The generic approach (optimize all uprobes) is hard due to emulating > possible multiple original instructions and its related issues. The > usdt case, which stores 5-byte nop seems much easier, so starting > with that. > > The basic idea is to replace breakpoint exception with syscall which > is faster on x86_64. For more details please see changelog of patch 7. So this is really about the fact that syscalls are faster than traps on x86_64? Is there something similar on ARM64, or are they roughly the same speed there? That is, I don't think this scheme will work for the various RISC architectures, given their very limited immediate range turns a typical call into a multi-instruction trainwreck real quick. Now, that isn't a problem if their exceptions and syscalls are of equal speed.