On Mon, 09 Sep 2024 07:11:14 +0000, Liao Chang wrote: > v2->v1: > 1. Remove the simuation of STP and the related bits. > 2. Use arm64_skip_faulting_instruction for single-stepping or FEAT_BTI > scenario. > > As Andrii pointed out, the uprobe/uretprobe selftest bench run into a > counterintuitive result that nop and push variants are much slower than > ret variant [0]. The root cause lies in the arch_probe_analyse_insn(), > which excludes 'nop' and 'stp' from the emulatable instructions list. > This force the kernel returns to userspace and execute them out-of-line, > then trapping back to kernel for running uprobe callback functions. This > leads to a significant performance overhead compared to 'ret' variant, > which is already emulated. > > [...] Applied to arm64 (for-next/probes), thanks! I fixed it up according to Mark's comments. [1/1] arm64: insn: Simulate nop instruction for better uprobe performance https://git.kernel.org/arm64/c/ac4ad5c09b34 -- Catalin