Re: [PATCH 1/2] uprobes: Optimize the return_instance related routines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 8, 2024 at 6:00 PM Liao Chang <liaochang1@xxxxxxxxxx> wrote:
>
> Reduce the runtime overhead for struct return_instance data managed by
> uretprobe. This patch replaces the dynamic allocation with statically
> allocated array, leverage two facts that are limited nesting depth of
> uretprobe (max 64) and the function call style of return_instance usage
> (create at entry, free at exit).
>
> This patch has been tested on Kunpeng916 (Hi1616), 4 NUMA nodes, 64
> cores @ 2.4GHz. Redis benchmarks show a throughput gain by 2% for Redis
> GET and SET commands:
>
> ------------------------------------------------------------------
> Test case       | No uretprobes | uretprobes     | uretprobes
>                 |               | (current)      | (optimized)
> ==================================================================
> Redis SET (RPS) | 47025         | 40619 (-13.6%) | 41529 (-11.6%)
> ------------------------------------------------------------------
> Redis GET (RPS) | 46715         | 41426 (-11.3%) | 42306 (-9.4%)
> ------------------------------------------------------------------
>
> Signed-off-by: Liao Chang <liaochang1@xxxxxxxxxx>
> ---
>  include/linux/uprobes.h |  10 ++-
>  kernel/events/uprobes.c | 162 ++++++++++++++++++++++++----------------
>  2 files changed, 105 insertions(+), 67 deletions(-)
>

[...]

> +static void cleanup_return_instances(struct uprobe_task *utask, bool chained,
> +                                    struct pt_regs *regs)
> +{
> +       struct return_frame *frame = &utask->frame;
> +       struct return_instance *ri = frame->return_instance;
> +       enum rp_check ctx = chained ? RP_CHECK_CHAIN_CALL : RP_CHECK_CALL;
> +
> +       while (ri && !arch_uretprobe_is_alive(ri, ctx, regs)) {
> +               ri = next_ret_instance(frame, ri);
> +               utask->depth--;
> +       }
> +       frame->return_instance = ri;
> +}
> +
> +static struct return_instance *alloc_return_instance(struct uprobe_task *task)
> +{
> +       struct return_frame *frame = &task->frame;
> +
> +       if (!frame->vaddr) {
> +               frame->vaddr = kcalloc(MAX_URETPROBE_DEPTH,
> +                               sizeof(struct return_instance), GFP_KERNEL);

Are you just pre-allocating MAX_URETPROBE_DEPTH instances always?
I.e., even if we need just one (because there is no recursion), you'd
still waste memory for all 64 ones?

That seems rather wasteful.

Have you considered using objpool for fast reuse across multiple CPUs?
Check lib/objpool.c.

> +               if (!frame->vaddr)
> +                       return NULL;
> +       }
> +
> +       if (!frame->return_instance) {
> +               frame->return_instance = frame->vaddr;
> +               return frame->return_instance;
> +       }
> +
> +       return ++frame->return_instance;
> +}
> +
> +static inline bool return_frame_empty(struct uprobe_task *task)
> +{
> +       return !task->frame.return_instance;
>  }
>
>  /*

[...]





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux