Re: [PATCH bpf-next v2 03/20] bpf: support bpf tracing/iter programs for BPF_LINK_CREATE

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 5 May 2020 20:09:51 -0700

On Tue, May 5, 2020 at 5:54 PM Alexei Starovoitov <ast@xxxxxx> wrote:
>
> On 5/5/20 5:14 PM, Yonghong Song wrote:
> >
> >
> > On 5/5/20 2:30 PM, Andrii Nakryiko wrote:
> >> On Sun, May 3, 2020 at 11:26 PM Yonghong Song <yhs@xxxxxx> wrote:
> >>>
> >>> Given a bpf program, the step to create an anonymous bpf iterator is:
> >>>    - create a bpf_iter_link, which combines bpf program and the target.
> >>>      In the future, there could be more information recorded in the
> >>> link.
> >>>      A link_fd will be returned to the user space.
> >>>    - create an anonymous bpf iterator with the given link_fd.
> >>>
> >>> The bpf_iter_link can be pinned to bpffs mount file system to
> >>> create a file based bpf iterator as well.
> >>>
> >>> The benefit to use of bpf_iter_link:
> >>>    - using bpf link simplifies design and implementation as bpf link
> >>>      is used for other tracing bpf programs.
> >>>    - for file based bpf iterator, bpf_iter_link provides a standard
> >>>      way to replace underlying bpf programs.
> >>>    - for both anonymous and free based iterators, bpf link query
> >>>      capability can be leveraged.
> >>>
> >>> The patch added support of tracing/iter programs for BPF_LINK_CREATE.
> >>> A new link type BPF_LINK_TYPE_ITER is added to facilitate link
> >>> querying. Currently, only prog_id is needed, so there is no
> >>> additional in-kernel show_fdinfo() and fill_link_info() hook
> >>> is needed for BPF_LINK_TYPE_ITER link.
> >>>
> >>> Signed-off-by: Yonghong Song <yhs@xxxxxx>
> >>> ---
> >>
> >> LGTM. See small nit about __GFP_NOWARN.
> >>
> >> Acked-by: Andrii Nakryiko <andriin@xxxxxx>
> >>
> >>
> >>>   include/linux/bpf.h            |  1 +
> >>>   include/linux/bpf_types.h      |  1 +
> >>>   include/uapi/linux/bpf.h       |  1 +
> >>>   kernel/bpf/bpf_iter.c          | 62 ++++++++++++++++++++++++++++++++++
> >>>   kernel/bpf/syscall.c           | 14 ++++++++
> >>>   tools/include/uapi/linux/bpf.h |  1 +
> >>>   6 files changed, 80 insertions(+)
> >>>
> >>
> >> [...]
> >>
> >>> +int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog
> >>> *prog)
> >>> +{
> >>> +       struct bpf_link_primer link_primer;
> >>> +       struct bpf_iter_target_info *tinfo;
> >>> +       struct bpf_iter_link *link;
> >>> +       bool existed = false;
> >>> +       u32 prog_btf_id;
> >>> +       int err;
> >>> +
> >>> +       if (attr->link_create.target_fd || attr->link_create.flags)
> >>> +               return -EINVAL;
> >>> +
> >>> +       prog_btf_id = prog->aux->attach_btf_id;
> >>> +       mutex_lock(&targets_mutex);
> >>> +       list_for_each_entry(tinfo, &targets, list) {
> >>> +               if (tinfo->btf_id == prog_btf_id) {
> >>> +                       existed = true;
> >>> +                       break;
> >>> +               }
> >>> +       }
> >>> +       mutex_unlock(&targets_mutex);
> >>> +       if (!existed)
> >>> +               return -ENOENT;
> >>> +
> >>> +       link = kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN);
> >>
> >> nit: all existing link implementation don't specify __GFP_NOWARN,
> >> wonder if bpf_iter_link should be special?
> >
> > Nothing special. Just feel __GFP_NOWARN is the right thing to do to
> > avoid pollute dmesg since -ENOMEM is returned to user space. But in
> > reality, unlike some key/value allocation where the size could be huge
> > and __GFP_NOWARN might be more useful, here, sizeof(*link) is fixed
> > and small, __GFP_NOWARN probably not that useful.
> >
> > Will drop it.
>
> actually all existing user space driven allocation have nowarn.

Can you define "user space driven"? I understand why for map, map key,
map value, program we want to do that, because it's way too easy for
user-space to specify huge sizes and allocation is proportional to
that size. But in this case links are fixed-sized objects, same as
struct file and struct inode. From BPF world, for instance, there is
struct bpf_prog_list, which is created when user is attaching BPF
program to cgroup, so it is user-space driven in similar sense. Yet we
allocate it without __GFP_NOWARN.

> If we missed it in other link allocs we should probably add it.

Before bpf_link was formalized, raw_tracepoint_open was creating
struct bpf_raw_tracepoint, without NOWARN.