Re: [PATCH bpf-next] uprobes: Fix the xol slots reserved for uretprobe trampoline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Jiri

在 2024/6/20 0:22, Jiri Olsa 写道:
> On Wed, Jun 19, 2024 at 01:34:11AM +0000, Liao Chang wrote:
>> When the new uretprobe system call was added [1], the xol slots reserved
>> for the uretprobe trampoline might be insufficient on some architecture.
> 
> hum, uretprobe syscall is x86_64 specific, nothing was changed wrt slots
> or other architectures.. could you be more specific in what's changed?

I observed a significant performance degradation when using uprobe to trace Redis
on arm64 machine. redis-benchmark showed a decrease of around 7% with uprobes
attached to two hot functions, and a much worse result with uprobes on more hot
functions. Here is a samll snapshot of benchmark result.

No uprobe
---------
SET: 73686.54 rps
GET: 73702.83 rps

Uprobes on two hot functions
----------------------------
SET: 68441.59 rps, -7.1%
GET: 68951.25 rps, -6.4%

Uprobes at three hot functions
------------------------------
SET: 40953.39 rps,-44.4%
GET: 41609.45 rps,-43.5%

To investigate the potential improvements, i ported the uretprobe syscall and
trampoline feature for arm64. The trampoline code used on arm64 looks like this:

uretprobe_trampoline_for_arm64:
	str x8, [sp, #-8]!
	mov x8, __NR_uretprobe
	svc #0

Due to arm64 uses fixed-lenghth instruction of 4 bytes, the total size of the trampoline
code is 12 bytes, since the ixol slot size is typical 4 bytes, the misfit bewteen the
slot size of trampoline size requires more than one slot to reserve.

Thanks.

> 
> thanks,
> jirka
> 
>> For example, on arm64, the trampoline is consist of three instructions
>> at least. So it should mark enough bits in area->bitmaps and
>> and area->slot_count for the reserved slots.
>>
>> [1] https://lore.kernel.org/all/20240611112158.40795-4-jolsa@xxxxxxxxxx/
>>
>> Signed-off-by: Liao Chang <liaochang1@xxxxxxxxxx>
>> ---
>>  kernel/events/uprobes.c | 11 +++++++----
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>> index 2816e65729ac..efd2d7f56622 100644
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -1485,7 +1485,7 @@ void * __weak arch_uprobe_trampoline(unsigned long *psize)
>>  static struct xol_area *__create_xol_area(unsigned long vaddr)
>>  {
>>  	struct mm_struct *mm = current->mm;
>> -	unsigned long insns_size;
>> +	unsigned long insns_size, slot_nr;
>>  	struct xol_area *area;
>>  	void *insns;
>>  
>> @@ -1508,10 +1508,13 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
>>  
>>  	area->vaddr = vaddr;
>>  	init_waitqueue_head(&area->wq);
>> -	/* Reserve the 1st slot for get_trampoline_vaddr() */
>> -	set_bit(0, area->bitmap);
>> -	atomic_set(&area->slot_count, 1);
>>  	insns = arch_uprobe_trampoline(&insns_size);
>> +	/* Reserve enough slots for the uretprobe trampoline */
>> +	for (slot_nr = 0;
>> +	     slot_nr < max((insns_size / UPROBE_XOL_SLOT_BYTES), 1);
>> +	     slot_nr++)
>> +		set_bit(slot_nr, area->bitmap);
>> +	atomic_set(&area->slot_count, slot_nr);
>>  	arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
>>  
>>  	if (!xol_add_vma(mm, area))
>> -- 
>> 2.34.1
>>

-- 
BR
Liao, Chang




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux