On Wed, Sep 6, 2023 at 5:37 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote: > > Hello, > > 在 2023/9/6 03:02, Alexei Starovoitov 写道: > > On Sun, Aug 27, 2023 at 12:21 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote: > >> > >> This Patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow > >> creation and manipulation of struct bpf_iter_css_task in open-coded > >> iterator style. These kfuncs actually wrapps > >> css_task_iter_{start,next,end}. BPF programs can use these kfuncs through > >> bpf_for_each macro for iteration of all tasks under a css. > >> > >> Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> > >> --- > >> include/uapi/linux/bpf.h | 4 ++++ > >> kernel/bpf/helpers.c | 3 +++ > >> kernel/bpf/task_iter.c | 39 ++++++++++++++++++++++++++++++++++ > >> tools/include/uapi/linux/bpf.h | 4 ++++ > >> tools/lib/bpf/bpf_helpers.h | 7 ++++++ > >> 5 files changed, 57 insertions(+) > >> > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> index 60a9d59beeab..2a6e9b99564b 100644 > >> --- a/include/uapi/linux/bpf.h > >> +++ b/include/uapi/linux/bpf.h > >> @@ -7195,4 +7195,8 @@ struct bpf_iter_num { > >> __u64 __opaque[1]; > >> } __attribute__((aligned(8))); > >> > >> +struct bpf_iter_css_task { > >> + __u64 __opaque[1]; > >> +} __attribute__((aligned(8))); > >> + > >> #endif /* _UAPI__LINUX_BPF_H__ */ > >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >> index 9e80efa59a5d..cf113ad24837 100644 > >> --- a/kernel/bpf/helpers.c > >> +++ b/kernel/bpf/helpers.c > >> @@ -2455,6 +2455,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL) > >> BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW) > >> BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL) > >> BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY) > >> +BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW) > >> +BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL) > >> +BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) > >> BTF_ID_FLAGS(func, bpf_dynptr_adjust) > >> BTF_ID_FLAGS(func, bpf_dynptr_is_null) > >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) > >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c > >> index c4ab9d6cdbe9..b1bdba40b684 100644 > >> --- a/kernel/bpf/task_iter.c > >> +++ b/kernel/bpf/task_iter.c > >> @@ -823,6 +823,45 @@ const struct bpf_func_proto bpf_find_vma_proto = { > >> .arg5_type = ARG_ANYTHING, > >> }; > >> > >> +struct bpf_iter_css_task_kern { > >> + struct css_task_iter *css_it; > >> +} __attribute__((aligned(8))); > >> + > >> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it, > >> + struct cgroup_subsys_state *css, unsigned int flags) > >> +{ > >> + struct bpf_iter_css_task_kern *kit = (void *)it; > >> + > >> + BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task)); > >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) != > >> + __alignof__(struct bpf_iter_css_task)); > >> + > >> + kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL); > >> + if (!kit->css_it) > >> + return -ENOMEM; > >> + css_task_iter_start(css, flags, kit->css_it); > > > > Some of the flags are internal. Like CSS_TASK_ITER_SKIPPED. > > The kfunc should probably only allow CSS_TASK_ITER_PROCS | > > CSS_TASK_ITER_THREADED, > > and not CSS_TASK_ITER_THREADED alone. > > > > Since they're #define-s it's not easy for bpf prog to use them. > > I think would be good to have a pre-patch that converts them to enum, > > so that bpf prog can take them from vmlinux.h. > > > > > > But the main issue of the patch that it adds this iter to common kfuncs. > > That's not safe, since css_task_iter_*() does spin_unlock_irq() which > > might screw up irq flags depending on the context where bpf prog is running. > > Can css_task_iter internals switch to irqsave/irqrestore? > > Yes, I think so. Switching to irqsave/irqrestore is no harm. > > > css_set_lock is also global, so the bpf side has to be careful in > > where it allows to use this iter. > > bpf_lsm hooks are safe, most of bpf iter-s are safe too. > > Future bpf-oom hooks are probably safe as well. > > We probably need an allowlist here. > > What should we do if we want to make a allowlist? > Do you mean we need to check prog_type or attach_type when we call these > kfuncs in BPF verifier? If so, we should add a new attach_type or > prog_type for bpf-oom in the feature so we can know the current BPF > program is hooking for OOM Policy. bpf-oom type can be added later. Let's make this one work for bpf-lsm and sleepable iter-s first. See SEC("iter.s