On Fri, Sep 15, 2023 at 4:57 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote: > > Hello. > > 在 2023/9/15 07:26, Andrii Nakryiko 写道: > > On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote: > >> > >> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which > >> allow creation and manipulation of struct bpf_iter_css in open-coded > >> iterator style. These kfuncs actually wrapps css_next_descendant_{pre, > >> post}. BPF programs can use these kfuncs through bpf_for_each macro for > >> iteration of all descendant css under a root css. > >> > >> Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> > >> --- > >> include/uapi/linux/bpf.h | 8 +++++ > >> kernel/bpf/helpers.c | 6 ++++ > >> kernel/bpf/task_iter.c | 53 ++++++++++++++++++++++++++++++++++ > >> tools/include/uapi/linux/bpf.h | 8 +++++ > >> tools/lib/bpf/bpf_helpers.h | 12 ++++++++ > >> 5 files changed, 87 insertions(+) > >> > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> index befa55b52e29..57760afc13d0 100644 > >> --- a/include/uapi/linux/bpf.h > >> +++ b/include/uapi/linux/bpf.h > >> @@ -7326,4 +7326,12 @@ struct bpf_iter_process { > >> __u64 __opaque[1]; > >> } __attribute__((aligned(8))); > >> > >> +struct bpf_iter_css_pre { > >> + __u64 __opaque[2]; > >> +} __attribute__((aligned(8))); > >> + > >> +struct bpf_iter_css_post { > >> + __u64 __opaque[2]; > >> +} __attribute__((aligned(8))); > >> + > >> #endif /* _UAPI__LINUX_BPF_H__ */ > >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >> index 9b7d2c6f99d1..ca1f6404af9e 100644 > >> --- a/kernel/bpf/helpers.c > >> +++ b/kernel/bpf/helpers.c > >> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) > >> BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW) > >> BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL) > >> BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY) > >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW) > >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL) > >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY) > >> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW) > >> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL) > >> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY) > >> BTF_ID_FLAGS(func, bpf_dynptr_adjust) > >> BTF_ID_FLAGS(func, bpf_dynptr_is_null) > >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) > >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c > >> index 9d1927dc3a06..8963fc779b87 100644 > >> --- a/kernel/bpf/task_iter.c > >> +++ b/kernel/bpf/task_iter.c > >> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it) > >> { > >> } > >> > >> +struct bpf_iter_css_kern { > >> + struct cgroup_subsys_state *root; > >> + struct cgroup_subsys_state *pos; > >> +} __attribute__((aligned(8))); > >> + > >> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it, > >> + struct cgroup_subsys_state *root) > > > > similar to my comment on previous patches, please see > > kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay > > consistent. We have one iterator that accepts parameters defining > > iteration order and starting cgroup. Unless there are some technical > > reasons we can't follow similar approach with this open-coded iter, > > let's use the same approach. We can even reuse > > BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST, > > BPF_CGROUP_ITER_ANCESTORS_UP enums. > > > > I know your concern. It would be nice if we keep consistent with > kernel/bpf/cgroup_iter.c > > But this patch actually want to support iterating css > (cgroup_subsys_state) not cgroup (css is more low lever). > With css_iter we can do something like > "for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre" > in BPF Progs which is hard for cgroup_iter. In the future we can use > this iterator to plug some customizable policy in other resource control > system. That's fine if it's not exactly cgroup iter and returns a different kernel object. But let's at least consistently use BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST/BPF_CGROUP_ITER_ANCESTORS_UP approach as a way to specify iteration order? > > BTW, what I did in RFC actually very similar with the approach of > cgroup_iter. > (https://lore.kernel.org/all/20230827072057.1591929-4-zhouchuyi@xxxxxxxxxxxxx/). > > Thanks.