On Thu, Jan 16, 2025 at 11:47 AM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > > This patch modifies SCX to use BPF capabilities. > > Make all SCX kfuncs register to BPF capabilities instead of > BPF_PROG_TYPE_STRUCT_OPS. > > Add bpf_scx_bpf_capabilities_adjust as bpf_capabilities_adjust > callback function. > > Signed-off-by: Juntong Deng <juntong.deng@xxxxxxxxxxx> > --- > kernel/sched/ext.c | 74 ++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 62 insertions(+), 12 deletions(-) > > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c > index 7fff1d045477..53cc7c3ed80b 100644 > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -5765,10 +5765,66 @@ bpf_scx_get_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > } > } 'capabilities' name doesn't fit. The word already has its meaning in the kernel. It cannot be reused for a different purpose. > +static int bpf_scx_bpf_capabilities_adjust(unsigned long *bpf_capabilities, > + u32 context_info, bool enter) > +{ > + if (enter) { > + switch (context_info) { > + case offsetof(struct sched_ext_ops, select_cpu): > + ENABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_SELECT_CPU); > + ENABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_ENQUEUE); > + break; > + case offsetof(struct sched_ext_ops, enqueue): > + ENABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_ENQUEUE); > + break; > + case offsetof(struct sched_ext_ops, dispatch): > + ENABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_DISPATCH); > + break; > + case offsetof(struct sched_ext_ops, running): > + case offsetof(struct sched_ext_ops, stopping): > + case offsetof(struct sched_ext_ops, enable): > + ENABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_REST); > + break; > + case offsetof(struct sched_ext_ops, init): > + case offsetof(struct sched_ext_ops, exit): > + ENABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_UNLOCKED); > + break; > + default: > + return -EINVAL; > + } > + } else { > + switch (context_info) { > + case offsetof(struct sched_ext_ops, select_cpu): > + DISABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_SELECT_CPU); > + DISABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_ENQUEUE); > + break; > + case offsetof(struct sched_ext_ops, enqueue): > + DISABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_ENQUEUE); > + break; > + case offsetof(struct sched_ext_ops, dispatch): > + DISABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_DISPATCH); > + break; > + case offsetof(struct sched_ext_ops, running): > + case offsetof(struct sched_ext_ops, stopping): > + case offsetof(struct sched_ext_ops, enable): > + DISABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_REST); > + break; > + case offsetof(struct sched_ext_ops, init): > + case offsetof(struct sched_ext_ops, exit): > + DISABLE_BPF_CAPABILITY(bpf_capabilities, BPF_CAP_SCX_KF_UNLOCKED); > + break; > + default: > + return -EINVAL; > + } > + } > + return 0; > +} and this callback defeats the whole point of u32 bitmask. In earlier patch env->context_info = __btf_member_bit_offset(t, member) / 8; // moff is also wrong. The context_info name is too generic and misleading. and 'env' isn't a right place to save moff. Let's try to implement what was discussed earlier: 1 After successful check_struct_ops_btf_id() save moff in prog->aux->attach_st_ops_member_off. 2 Add .filter callback to sched-ext kfunc registration path and let it allow/deny kfuncs based on st_ops attach point. 3 Remove scx_kf_allow() and current->scx.kf_mask. That will be a nice perf win and will prove that this approach works end-to-end.