On Tue, Aug 22, 2023 at 1:14 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > > On 8/22/23 12:19 PM, David Marchevsky wrote: > > On 8/22/23 1:42 PM, Yonghong Song wrote: > >> > >> > >> On 8/21/23 10:05 PM, Dave Marchevsky wrote: > >>> This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow > >>> creation and manipulation of struct bpf_iter_task_vma in open-coded > >>> iterator style. BPF programs can use these kfuncs directly or through > >>> bpf_for_each macro for natural-looking iteration of all task vmas. > >>> > >>> The implementation borrows heavily from bpf_find_vma helper's locking - > >>> differing only in that it holds the mmap_read lock for all iterations > >>> while the helper only executes its provided callback on a maximum of 1 > >>> vma. Aside from locking, struct vma_iterator and vma_next do all the > >>> heavy lifting. > >>> > >>> The newly-added struct bpf_iter_task_vma has a name collision with a > >>> selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs > >>> file is renamed in order to avoid the collision. > >>> > >>> A pointer to an inner data struct, struct bpf_iter_task_vma_kern_data, is the > >>> only field in struct bpf_iter_task_vma. This is because the inner data > >>> struct contains a struct vma_iterator (not ptr), whose size is likely to > >>> change under us. If bpf_iter_task_vma_kern contained vma_iterator directly > >>> such a change would require change in opaque bpf_iter_task_vma struct's > >>> size. So better to allocate vma_iterator using BPF allocator, and since > >>> that alloc must already succeed, might as well allocate all iter fields, > >>> thereby freezing struct bpf_iter_task_vma size. > >>> > >>> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> > >>> Cc: Nathan Slingerland <slinger@xxxxxxxx> > >>> --- > >>> include/uapi/linux/bpf.h | 4 + > >>> kernel/bpf/helpers.c | 3 + > >>> kernel/bpf/task_iter.c | 84 +++++++++++++++++++ > >>> tools/include/uapi/linux/bpf.h | 4 + > >>> tools/lib/bpf/bpf_helpers.h | 8 ++ > >>> .../selftests/bpf/prog_tests/bpf_iter.c | 26 +++--- > >>> ...f_iter_task_vma.c => bpf_iter_task_vmas.c} | 0 > >>> 7 files changed, 116 insertions(+), 13 deletions(-) > >>> rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%) > >>> > >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >>> index 8790b3962e4b..49fc1989a548 100644 > >>> --- a/include/uapi/linux/bpf.h > >>> +++ b/include/uapi/linux/bpf.h > >>> @@ -7311,4 +7311,8 @@ struct bpf_iter_num { > >>> __u64 __opaque[1]; > >>> } __attribute__((aligned(8))); > >>> +struct bpf_iter_task_vma { > >>> + __u64 __opaque[1]; /* See bpf_iter_num comment above */ > >>> +} __attribute__((aligned(8))); > >> > >> In the future, we might have bpf_iter_cgroup, bpf_iter_task, bpf_iter_cgroup_task, etc. They may all use the same struct > >> like > >> struct bpf_iter_<...> { > >> __u64 __opaque[1]; > >> } __attribute__((aligned(8))); > >> > >> Maybe we want a generic one instead of having lots of > >> structs with the same underline definition? For example, > >> struct bpf_iter_generic > >> ? > >> > > > > The bpf_for_each macro assumes a consistent naming scheme for opaque iter struct > > and associated kfuncs. Having a 'bpf_iter_generic' shared amongst multiple types > > of iters would break the scheme. We could: > > > > * Add bpf_for_each_generic that only uses bpf_iter_generic > > * This exposes implementation details in an ugly way, though. > > * Do some macro magic to pick bpf_iter_generic for some types of iters, and > > use consistent naming pattern for others. > > * I'm not sure how to do this with preprocessor > > * Migrate all opaque iter structs to only contain pointer to bpf_mem_alloc'd > > data struct, and use bpf_iter_generic for all of them > > * Probably need to see more iter implementation / usage before making such > > a change > > * Do 'typedef __u64 __aligned(8) bpf_iter_<...> > > * BTF_KIND_TYPEDEF intead of BTF_KIND_STRUCT might throw off some verifier > > logic. Could do similar typedef w/ struct to try to work around > > it. > > > > Let me know what you think. Personally I considered doing typedef while > > implementing this, so that's the alternative I'd choose. > > Okay, since we have naming convention restriction, typedef probably the > best option, something like > typedef struct bpf_iter_num bpf_iter_task_vma > ? > > Verifier might need to be changed if verifier strips all modifiers > (including tyypedef) to find the struct name. I don't quite see how typedef helps here. Say we do: struct bpf_iter_task_vma { __u64 __opaque[1]; } __attribute__((aligned(8))); as Dave is proposing. Then tomorrow we add another bpf_iter_foo that is exactly the same opaque[1]. And we will have bpf_iter_num, bpf_iter_task_vma, bpf_iter_foo structs with the same layout. So what? Eye sore? In case we need to extend task_vma from 1 to 2 it will be easier to do when all of them are separate structs. And typedef has unknown verification implications. Either way we need to find a way to move these structs from uapi/bpf.h along with bpf_rb_root and friends to some "obviously unstable" header.