On Thu, Jan 9, 2025 at 11:24 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > > > > > > > > The main goal is to get rid of run-time mask check in SCX_CALL_OP() and > > > make it static by the verifier. To make that happen scx_kf_mask flags > > > would need to become KF_* flags while each struct-ops callback will > > > specify the expected mask. > > > Then at struct-ops prog attach time the verifier will see the expected mask > > > and can check that all kfuncs calls of this particular program > > > satisfy the mask. Then all of the runtime overhead of > > > current->scx.kf_mask and scx_kf_allowed() will go away. > > > > Thanks for pointing this out. > > > > Yes, I am interested in working on it. > > > > I will try to solve this problem in a separate patch series. > > > > > > The following are my thoughts: > > > > Should we really use KF_* to do this? I think KF_* is currently more > > like declaring that a kfunc has some kind of attribute, e.g. > > KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments, > > rather than being used to categorise kfuncs. > > > > It is not sustainable to restrict the kfuncs that can be used based on > > program types, which are coarse-grained. This problem will get worse > > as kfuncs increase. > > > > In my opinion, managing the kfuncs available to bpf programs should be > > implemented as capabilities. Capabilities are a mature permission model. > > We can treat a set of kfuncs as a capability (like the various current > > kfunc_sets, but the current kfunc_sets did not carefully divide > > permissions). > > > > We should use separate BPF_CAP_XXX flags to manage these capabilities. > > For example, SCX may define BPF_CAP_SCX_DISPATCH. > > > > For program types, we should divide them into two levels, types and > > subtypes. Types are used to register common capabilities and subtypes > > are used to register specific capabilities. The verifier can check if > > the used kfuncs are allowed based on the type and subtype of the bpf > > program. > > > > I understand that we need to maintain backward compatibility to > > userspace, but capabilities are internal changes in the kernel. > > Perhaps we can make the current program types as subtypes and > > add 'types' that are only used internally, and more subtypes > > (program types) can be added in the future. > > Sorry for the delay. > imo CAP* approach doesn't fit. > caps are security bits exposed to user space. > Here there is no need to expose anything to user space. > > But you're also correct that we cannot extend kfunc KF_* flags > that easily. KF_* flags are limited to 32-bit and we're already > using 12 bits. > enum scx_kf_mask needs 5 bits, so we can squeeze them into > the current 32-bit field _for now_, > but eventually we'd need to refactor kfunc definition into a wider set: > BTF_ID_FLAGS(func, .. KF_*) > so that different struct_ops consumers can define their own bits. > > Right now SCX is the only st_ops consumer who needs this feature, > so let's squeeze into the existing KF facility. > > First step is to remap scx_kf_mask bits into unused bits in KF_ > and annotate corresponding sched-ext kfuncs with it. > For example: > SCX_KF_DISPATCH will become > KF_DISPATCH (1 << 13) > > and all kfuncs that are allowed to be called from ->dispatch() callback > will be annotated like: > - BTF_KFUNCS_START(scx_kfunc_ids_dispatch) > - BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots) > - BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel) > + BTF_KFUNCS_START(scx_kfunc_ids_dispatch) > + BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH) > + BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH) > > > For sched_ext_ops callback annotations, I think, > the simplest approach is to add special > BTF_SET8_START(st_ops_flags) > BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH) > and so on for other ops stubs. > > sched_ext_ops__dispatch() is an empty function that > exists in the vmlinux, and though it's not a kfunc > we can use it to annotate > (struct sched_ext_ops *)->dispatch() callback > with a particular KF_ flag > (or a set of flags for SCX_KF_RQ_LOCKED case). > > Then the verifier (while analyzing the program that is targeted > to be attach to this ->dispatch() hook) > will check this extra KF flag in st_ops > and will only allow to call kfuncs with matching flags: > > if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this callback > > The end result current->scx.kf_mask will be removed > and instead of run-time check it will become static verifier check. Shall we move some of these logics from verifier core to btf_kfunc_id_set.filter()? IIUC, this would avoid using extra KF_* bits. To make the filter functions more capable, we probably need to pass bpf_verifier_env into the filter() function. Does this make sense? Thanks, Song