Hi, Jiri 在 2024/6/20 0:22, Jiri Olsa 写道: > On Wed, Jun 19, 2024 at 01:34:11AM +0000, Liao Chang wrote: >> When the new uretprobe system call was added [1], the xol slots reserved >> for the uretprobe trampoline might be insufficient on some architecture. > > hum, uretprobe syscall is x86_64 specific, nothing was changed wrt slots > or other architectures.. could you be more specific in what's changed? I observed a significant performance degradation when using uprobe to trace Redis on arm64 machine. redis-benchmark showed a decrease of around 7% with uprobes attached to two hot functions, and a much worse result with uprobes on more hot functions. Here is a samll snapshot of benchmark result. No uprobe --------- SET: 73686.54 rps GET: 73702.83 rps Uprobes on two hot functions ---------------------------- SET: 68441.59 rps, -7.1% GET: 68951.25 rps, -6.4% Uprobes at three hot functions ------------------------------ SET: 40953.39 rps,-44.4% GET: 41609.45 rps,-43.5% To investigate the potential improvements, i ported the uretprobe syscall and trampoline feature for arm64. The trampoline code used on arm64 looks like this: uretprobe_trampoline_for_arm64: str x8, [sp, #-8]! mov x8, __NR_uretprobe svc #0 Due to arm64 uses fixed-lenghth instruction of 4 bytes, the total size of the trampoline code is 12 bytes, since the ixol slot size is typical 4 bytes, the misfit bewteen the slot size of trampoline size requires more than one slot to reserve. Thanks. > > thanks, > jirka > >> For example, on arm64, the trampoline is consist of three instructions >> at least. So it should mark enough bits in area->bitmaps and >> and area->slot_count for the reserved slots. >> >> [1] https://lore.kernel.org/all/20240611112158.40795-4-jolsa@xxxxxxxxxx/ >> >> Signed-off-by: Liao Chang <liaochang1@xxxxxxxxxx> >> --- >> kernel/events/uprobes.c | 11 +++++++---- >> 1 file changed, 7 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c >> index 2816e65729ac..efd2d7f56622 100644 >> --- a/kernel/events/uprobes.c >> +++ b/kernel/events/uprobes.c >> @@ -1485,7 +1485,7 @@ void * __weak arch_uprobe_trampoline(unsigned long *psize) >> static struct xol_area *__create_xol_area(unsigned long vaddr) >> { >> struct mm_struct *mm = current->mm; >> - unsigned long insns_size; >> + unsigned long insns_size, slot_nr; >> struct xol_area *area; >> void *insns; >> >> @@ -1508,10 +1508,13 @@ static struct xol_area *__create_xol_area(unsigned long vaddr) >> >> area->vaddr = vaddr; >> init_waitqueue_head(&area->wq); >> - /* Reserve the 1st slot for get_trampoline_vaddr() */ >> - set_bit(0, area->bitmap); >> - atomic_set(&area->slot_count, 1); >> insns = arch_uprobe_trampoline(&insns_size); >> + /* Reserve enough slots for the uretprobe trampoline */ >> + for (slot_nr = 0; >> + slot_nr < max((insns_size / UPROBE_XOL_SLOT_BYTES), 1); >> + slot_nr++) >> + set_bit(slot_nr, area->bitmap); >> + atomic_set(&area->slot_count, slot_nr); >> arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size); >> >> if (!xol_add_vma(mm, area)) >> -- >> 2.34.1 >> -- BR Liao, Chang