Re: [PATCH bpf-next 1/3] bpf: introduce pinnable bpf_link abstraction

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 2 Mar 2020 15:37:32 -0800

On Mon, Mar 2, 2020 at 1:40 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:
>
> > On Mon, Mar 2, 2020 at 2:13 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >>
> >> Andrii Nakryiko <andriin@xxxxxx> writes:
> >>
> >> > Introduce bpf_link abstraction, representing an attachment of BPF program to
> >> > a BPF hook point (e.g., tracepoint, perf event, etc). bpf_link encapsulates
> >> > ownership of attached BPF program, reference counting of a link itself, when
> >> > reference from multiple anonymous inodes, as well as ensures that release
> >> > callback will be called from a process context, so that users can safely take
> >> > mutex locks and sleep.
> >> >
> >> > Additionally, with a new abstraction it's now possible to generalize pinning
> >> > of a link object in BPF FS, allowing to explicitly prevent BPF program
> >> > detachment on process exit by pinning it in a BPF FS and let it open from
> >> > independent other process to keep working with it.
> >> >
> >> > Convert two existing bpf_link-like objects (raw tracepoint and tracing BPF
> >> > program attachments) into utilizing bpf_link framework, making them pinnable
> >> > in BPF FS. More FD-based bpf_links will be added in follow up patches.
> >> >
> >> > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx>
> >> > ---
> >> >  include/linux/bpf.h  |  13 +++
> >> >  kernel/bpf/inode.c   |  42 ++++++++-
> >> >  kernel/bpf/syscall.c | 209 ++++++++++++++++++++++++++++++++++++-------
> >> >  3 files changed, 226 insertions(+), 38 deletions(-)
> >> >

[...]

> >> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >> > index c536c65256ad..fca8de7e7872 100644
> >> > --- a/kernel/bpf/syscall.c
> >> > +++ b/kernel/bpf/syscall.c
> >> > @@ -2173,23 +2173,153 @@ static int bpf_obj_get(const union bpf_attr *attr)
> >> >                               attr->file_flags);
> >> >  }
> >> >
> >> > -static int bpf_tracing_prog_release(struct inode *inode, struct file *filp)
> >> > +struct bpf_link {
> >> > +     atomic64_t refcnt;
> >>
> >> refcount_t ?
> >
> > Both bpf_map and bpf_prog stick to atomic64 for their refcounting, so
> > I'd like to stay consistent and use refcount that can't possible leak
> > resources (which refcount_t can, if it's overflown).
>
> refcount_t is specifically supposed to turn a possible use-after-free on
> under/overflow into a warning, isn't it? Not going to insist or anything
> here, just found it odd that you'd prefer the other...

Well, underflow is a huge bug that should never happen in well-tested
code (at least that's assumption for bpf_map and bpf_prog), and we are
generally very careful about that. Overflow can happen only because
refcount_t is using 32-bit integer, which atomic64_t side-steps
completely by going to 64-bit integer. So yeah, I'd rather stick to
the same stuff that's used for bpf_map and bpf_prog.

>
> -Toke
>