Re: [PATCH v3 bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 22 Aug 2023 15:36:50 -0700

On Tue, Aug 22, 2023 at 1:14 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
>
> On 8/22/23 12:19 PM, David Marchevsky wrote:
> > On 8/22/23 1:42 PM, Yonghong Song wrote:
> >>
> >>
> >> On 8/21/23 10:05 PM, Dave Marchevsky wrote:
> >>> This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
> >>> creation and manipulation of struct bpf_iter_task_vma in open-coded
> >>> iterator style. BPF programs can use these kfuncs directly or through
> >>> bpf_for_each macro for natural-looking iteration of all task vmas.
> >>>
> >>> The implementation borrows heavily from bpf_find_vma helper's locking -
> >>> differing only in that it holds the mmap_read lock for all iterations
> >>> while the helper only executes its provided callback on a maximum of 1
> >>> vma. Aside from locking, struct vma_iterator and vma_next do all the
> >>> heavy lifting.
> >>>
> >>> The newly-added struct bpf_iter_task_vma has a name collision with a
> >>> selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs
> >>> file is renamed in order to avoid the collision.
> >>>
> >>> A pointer to an inner data struct, struct bpf_iter_task_vma_kern_data, is the
> >>> only field in struct bpf_iter_task_vma. This is because the inner data
> >>> struct contains a struct vma_iterator (not ptr), whose size is likely to
> >>> change under us. If bpf_iter_task_vma_kern contained vma_iterator directly
> >>> such a change would require change in opaque bpf_iter_task_vma struct's
> >>> size. So better to allocate vma_iterator using BPF allocator, and since
> >>> that alloc must already succeed, might as well allocate all iter fields,
> >>> thereby freezing struct bpf_iter_task_vma size.
> >>>
> >>> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx>
> >>> Cc: Nathan Slingerland <slinger@xxxxxxxx>
> >>> ---
> >>>    include/uapi/linux/bpf.h                      |  4 +
> >>>    kernel/bpf/helpers.c                          |  3 +
> >>>    kernel/bpf/task_iter.c                        | 84 +++++++++++++++++++
> >>>    tools/include/uapi/linux/bpf.h                |  4 +
> >>>    tools/lib/bpf/bpf_helpers.h                   |  8 ++
> >>>    .../selftests/bpf/prog_tests/bpf_iter.c       | 26 +++---
> >>>    ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
> >>>    7 files changed, 116 insertions(+), 13 deletions(-)
> >>>    rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
> >>>
> >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>> index 8790b3962e4b..49fc1989a548 100644
> >>> --- a/include/uapi/linux/bpf.h
> >>> +++ b/include/uapi/linux/bpf.h
> >>> @@ -7311,4 +7311,8 @@ struct bpf_iter_num {
> >>>        __u64 __opaque[1];
> >>>    } __attribute__((aligned(8)));
> >>>    +struct bpf_iter_task_vma {
> >>> +    __u64 __opaque[1]; /* See bpf_iter_num comment above */
> >>> +} __attribute__((aligned(8)));
> >>
> >> In the future, we might have bpf_iter_cgroup, bpf_iter_task, bpf_iter_cgroup_task, etc. They may all use the same struct
> >> like
> >>    struct bpf_iter_<...> {
> >>      __u64 __opaque[1];
> >>    } __attribute__((aligned(8)));
> >>
> >> Maybe we want a generic one instead of having lots of
> >> structs with the same underline definition? For example,
> >>    struct bpf_iter_generic
> >> ?
> >>
> >
> > The bpf_for_each macro assumes a consistent naming scheme for opaque iter struct
> > and associated kfuncs. Having a 'bpf_iter_generic' shared amongst multiple types
> > of iters would break the scheme. We could:
> >
> >    * Add bpf_for_each_generic that only uses bpf_iter_generic
> >      * This exposes implementation details in an ugly way, though.
> >    * Do some macro magic to pick bpf_iter_generic for some types of iters, and
> >      use consistent naming pattern for others.
> >      * I'm not sure how to do this with preprocessor
> >    * Migrate all opaque iter structs to only contain pointer to bpf_mem_alloc'd
> >      data struct, and use bpf_iter_generic for all of them
> >      * Probably need to see more iter implementation / usage before making such
> >        a change
> >    * Do 'typedef __u64 __aligned(8) bpf_iter_<...>
> >      * BTF_KIND_TYPEDEF intead of BTF_KIND_STRUCT might throw off some verifier
> >        logic. Could do similar typedef w/ struct to try to work around
> >        it.
> >
> > Let me know what you think. Personally I considered doing typedef while
> > implementing this, so that's the alternative I'd choose.
>
> Okay, since we have naming convention restriction, typedef probably the
> best option, something like
>    typedef struct bpf_iter_num bpf_iter_task_vma
> ?
>
> Verifier might need to be changed if verifier strips all modifiers
> (including tyypedef) to find the struct name.

I don't quite see how typedef helps here.
Say we do:
struct bpf_iter_task_vma {
     __u64 __opaque[1];
} __attribute__((aligned(8)));

as Dave is proposing.
Then tomorrow we add another bpf_iter_foo that is exactly the same opaque[1].
And we will have bpf_iter_num, bpf_iter_task_vma, bpf_iter_foo structs
with the same layout. So what? Eye sore?
In case we need to extend task_vma from 1 to 2 it will be easier to do
when all of them are separate structs.

And typedef has unknown verification implications.

Either way we need to find a way to move these structs from uapi/bpf.h
along with bpf_rb_root and friends to some "obviously unstable" header.