Re: [PATCH RFC bpf-next] bpf: defer bpf_link dealloc to after RCU grace period

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 25, 2024 at 2:51 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
>
> BPF link for some program types is passed as a "context" which can be
> used by those BPF programs to look up additional information. E.g., for
> BPF raw tracepoints, link is used to fetch BPF cookie value, similarly
> for BPF multi-kprobes and multi-uprobes.
>
> Because of this runtime dependency, when bpf_link refcnt drops to zero
> that could be still active BPF programs running accessing link data
> (cookie, program pointer, etc).
>
> This patch accommodates this by delaying freeing memory to after RCU GP,
> which will fix BPF raw tp, multi-kprobe, and non-sleepable multi-uprobe.
>
> Perhaps a better approach would be to have a per-link flag specifying
> desired behavior: no delay, RCU delay, or task_trace RCU delay? So
> sending this as an RFC fix to discuss desired final solution.
>
> Fixes: d4dfc5700e86 ("bpf: pass whole link instead of prog when triggering raw tracepoint")
> Reported-by: syzbot+981935d9485a560bfbcb@xxxxxxxxxxxxxxxxxxxxxxxxx
> Reported-by: syzbot+2cb5a6c573e98db598cc@xxxxxxxxxxxxxxxxxxxxxxxxx
> Reported-by: syzbot+62d8b26793e8a2bd0516@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> ---
>  include/linux/bpf.h  |  8 +++++++-
>  kernel/bpf/syscall.c | 12 ++++++++++--
>  2 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 62762390c93d..d73a8978c800 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1573,7 +1573,13 @@ struct bpf_link {
>         enum bpf_link_type type;
>         const struct bpf_link_ops *ops;
>         struct bpf_prog *prog;
> -       struct work_struct work;
> +       /* rcu is used before freeing, work can be used to schedule that
> +        * RCU-based freeing before that, so they never overlap
> +        */
> +       union {
> +               struct rcu_head rcu;
> +               struct work_struct work;
> +       };
>  };
>
>  struct bpf_link_ops {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index e44c276e8617..af1591af10bb 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3024,6 +3024,14 @@ void bpf_link_inc(struct bpf_link *link)
>         atomic64_inc(&link->refcnt);
>  }
>
> +static void bpf_link_dealloc_deferred(struct rcu_head *rcu)
> +{
> +       struct bpf_link *link = container_of(rcu, struct bpf_link, rcu);
> +
> +       /* free bpf_link and its containing memory */
> +       link->ops->dealloc(link);
> +}
> +
>  /* bpf_link_free is guaranteed to be called from process context */
>  static void bpf_link_free(struct bpf_link *link)
>  {
> @@ -3033,8 +3041,8 @@ static void bpf_link_free(struct bpf_link *link)
>                 link->ops->release(link);
>                 bpf_prog_put(link->prog);
>         }
> -       /* free bpf_link and its containing memory */
> -       link->ops->dealloc(link);
> +       /* schedule BPF link deallocation after RCU grace period */
> +       call_rcu(&link->rcu, bpf_link_dealloc_deferred);

Do we have any sleepable progs that access link as well?
If so then call_rcu_tasks_trace ?
and check rcu_trace_implies_rcu_gp to avoid extra call_rcu.

Doing that for all link types like networking that don't look
into link is not great. Probably needs to look into link type?
Or look into prog->type and do the extra work only for raw_tp,
multi-[ku]probe ?





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux