On Thu, Jan 9, 2025 at 12:49 PM Song Liu <song@xxxxxxxxxx> wrote: > > On Thu, Jan 9, 2025 at 11:24 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > > > > > > > > > > > The main goal is to get rid of run-time mask check in SCX_CALL_OP() and > > > > make it static by the verifier. To make that happen scx_kf_mask flags > > > > would need to become KF_* flags while each struct-ops callback will > > > > specify the expected mask. > > > > Then at struct-ops prog attach time the verifier will see the expected mask > > > > and can check that all kfuncs calls of this particular program > > > > satisfy the mask. Then all of the runtime overhead of > > > > current->scx.kf_mask and scx_kf_allowed() will go away. > > > > > > Thanks for pointing this out. > > > > > > Yes, I am interested in working on it. > > > > > > I will try to solve this problem in a separate patch series. > > > > > > > > > The following are my thoughts: > > > > > > Should we really use KF_* to do this? I think KF_* is currently more > > > like declaring that a kfunc has some kind of attribute, e.g. > > > KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments, > > > rather than being used to categorise kfuncs. > > > > > > It is not sustainable to restrict the kfuncs that can be used based on > > > program types, which are coarse-grained. This problem will get worse > > > as kfuncs increase. > > > > > > In my opinion, managing the kfuncs available to bpf programs should be > > > implemented as capabilities. Capabilities are a mature permission model. > > > We can treat a set of kfuncs as a capability (like the various current > > > kfunc_sets, but the current kfunc_sets did not carefully divide > > > permissions). > > > > > > We should use separate BPF_CAP_XXX flags to manage these capabilities. > > > For example, SCX may define BPF_CAP_SCX_DISPATCH. > > > > > > For program types, we should divide them into two levels, types and > > > subtypes. Types are used to register common capabilities and subtypes > > > are used to register specific capabilities. The verifier can check if > > > the used kfuncs are allowed based on the type and subtype of the bpf > > > program. > > > > > > I understand that we need to maintain backward compatibility to > > > userspace, but capabilities are internal changes in the kernel. > > > Perhaps we can make the current program types as subtypes and > > > add 'types' that are only used internally, and more subtypes > > > (program types) can be added in the future. > > > > Sorry for the delay. > > imo CAP* approach doesn't fit. > > caps are security bits exposed to user space. > > Here there is no need to expose anything to user space. > > > > But you're also correct that we cannot extend kfunc KF_* flags > > that easily. KF_* flags are limited to 32-bit and we're already > > using 12 bits. > > enum scx_kf_mask needs 5 bits, so we can squeeze them into > > the current 32-bit field _for now_, > > but eventually we'd need to refactor kfunc definition into a wider set: > > BTF_ID_FLAGS(func, .. KF_*) > > so that different struct_ops consumers can define their own bits. > > > > Right now SCX is the only st_ops consumer who needs this feature, > > so let's squeeze into the existing KF facility. > > > > First step is to remap scx_kf_mask bits into unused bits in KF_ > > and annotate corresponding sched-ext kfuncs with it. > > For example: > > SCX_KF_DISPATCH will become > > KF_DISPATCH (1 << 13) > > > > and all kfuncs that are allowed to be called from ->dispatch() callback > > will be annotated like: > > - BTF_KFUNCS_START(scx_kfunc_ids_dispatch) > > - BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots) > > - BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel) > > + BTF_KFUNCS_START(scx_kfunc_ids_dispatch) > > + BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH) > > + BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH) > > > > > > For sched_ext_ops callback annotations, I think, > > the simplest approach is to add special > > BTF_SET8_START(st_ops_flags) > > BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH) > > and so on for other ops stubs. > > > > sched_ext_ops__dispatch() is an empty function that > > exists in the vmlinux, and though it's not a kfunc > > we can use it to annotate > > (struct sched_ext_ops *)->dispatch() callback > > with a particular KF_ flag > > (or a set of flags for SCX_KF_RQ_LOCKED case). > > > > Then the verifier (while analyzing the program that is targeted > > to be attach to this ->dispatch() hook) > > will check this extra KF flag in st_ops > > and will only allow to call kfuncs with matching flags: > > > > if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this callback > > > > The end result current->scx.kf_mask will be removed > > and instead of run-time check it will become static verifier check. > > Shall we move some of these logics from verifier core to > btf_kfunc_id_set.filter()? IIUC, this would avoid using extra > KF_* bits. To make the filter functions more capable, we > probably need to pass bpf_verifier_env into the filter() function. Passing env is probably unnecessary, but if save 'moff': const struct btf_type *t, const struct btf_member *member, u32 moff = __btf_member_bit_offset(t, member) / 8; after successful check_struct_ops_btf_id() somewhere in prog->aux then btf_kfunc_id_set.filter() can indeed do moff == offsetof(struct sched_ext_ops, dispatch) allow kfuncs suitable for dispatch.