> On Mon, Apr 11, 2022 at 6:04 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > On Mon, Apr 11, 2022 at 12:07:20PM -0700, Stanislav Fomichev wrote: > > On Fri, Apr 8, 2022 at 3:13 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > > > On Thu, Apr 07, 2022 at 03:31:07PM -0700, Stanislav Fomichev wrote: > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c > > > > index 064eccba641d..eca258ba71d8 100644 > > > > --- a/kernel/bpf/bpf_lsm.c > > > > +++ b/kernel/bpf/bpf_lsm.c > > > > @@ -35,6 +35,98 @@ BTF_SET_START(bpf_lsm_hooks) > > > > #undef LSM_HOOK > > > > BTF_SET_END(bpf_lsm_hooks) > > > > > > > > +static unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx, > > > > + const struct bpf_insn *insn) > > > > +{ > > > > + const struct bpf_prog *prog; > > > > + struct socket *sock; > > > > + struct cgroup *cgrp; > > > > + struct sock *sk; > > > > + int ret = 0; > > > > + u64 *regs; > > > > + > > > > + regs = (u64 *)ctx; > > > > + sock = (void *)(unsigned long)regs[BPF_REG_0]; > > > > + /*prog = container_of(insn, struct bpf_prog, insnsi);*/ > > > > + prog = (const struct bpf_prog *)((void *)insn - offsetof(struct bpf_prog, insnsi)); > > > nit. Rename prog to shim_prog. > > > > > > > + > > > > + if (unlikely(!sock)) > > > Is it possible in the lsm hooks? Can these hooks > > > be rejected at the load time instead? > > > > Doesn't seem like it can be null, at least from the quick review that > > I had; I'll take a deeper look. > > I guess in general I wanted to be more defensive here because there > > are 200+ hooks, the new ones might arrive, and it's better to have the > > check? > not too worried about an extra runtime check for now. > Instead, have a concern that it will be a usage surprise when a successfully > attached bpf program is then always silently ignored. > > Another question, for example, the inet_conn_request lsm_hook: > LSM_HOOK(int, 0, inet_conn_request, const struct sock *sk, struct sk_buff *skb, > struct request_sock *req) > > 'struct sock *sk' is the first argument, so it will use the current's cgroup. > inet_conn_request() is likely run in a softirq though and then it will be > incorrect. This runs in softirq case may not be limited to hooks that > take sk/sock argument also, not sure. For now, I decided not to treat 'struct sock' cases as 'socket' because of cases like sk_alloc_security where 'struct sock' is not initialized. Looks like treating them as 'current' is also not 100% foolproof. I guess we'd still have to have some special cases/exceptions. Let me bring back that 'struct sock' handler and add some btf-set to treat other non-inet_conn_request as the exception for now. > > > > + return 0; > > > > + > > > > + sk = sock->sk; > > > > + if (unlikely(!sk)) > > > Same here. > > > > > > > + return 0; > > > > + > > > > + cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); > > > > + if (likely(cgrp)) > unrelated, but while talking extra check, > > I think the shim_prog has already acted as a higher level (per attach-btf_id) > knob but do you think it may still worth to do a bpf_empty_prog_array > check here in case a cgroup may not have prog to run ? Oh yeah, good idea, let me add those cgroup_bpf_sock_enabled. > > > > + ret = BPF_PROG_RUN_ARRAY_CG(cgrp->bpf.effective[prog->aux->cgroup_atype], > > > > + ctx, bpf_prog_run, 0); > > [ ... ] > > > > > @@ -100,6 +123,15 @@ static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link) > > > > link->cgroup = NULL; > > > > } > > > > > > > > +static void bpf_cgroup_lsm_shim_release(struct bpf_prog *prog, > > > > + enum cgroup_bpf_attach_type atype) > > > > +{ > > > > + if (!prog || atype != prog->aux->cgroup_atype) > > > prog cannot be NULL here, no? > > > > > > The 'atype != prog->aux->cgroup_atype' looks suspicious also considering > > > prog->aux->cgroup_atype is only initialized (and meaningful) for BPF_LSM_CGROUP. > > > I suspect incorrectly passing this test will crash in the below > > > bpf_trampoline_unlink_cgroup_shim(). More on this later. > > > > > > > + return; > > > > + > > > > + bpf_trampoline_unlink_cgroup_shim(prog); > > > > +} > > > > + > > > > /** > > > > * cgroup_bpf_release() - put references of all bpf programs and > > > > * release all cgroup bpf data > > > > @@ -123,10 +155,16 @@ static void cgroup_bpf_release(struct work_struct *work) > > > Copying some missing loop context here: > > > > > > for (atype = 0; atype < ARRAY_SIZE(cgrp->bpf.progs); atype++) { > > > struct list_head *progs = &cgrp->bpf.progs[atype]; > > > struct bpf_prog_list *pl, *pltmp; > > > > > > > > > > > list_for_each_entry_safe(pl, pltmp, progs, node) { > > > > list_del(&pl->node); > > > > - if (pl->prog) > > > > + if (pl->prog) { > > > > + bpf_cgroup_lsm_shim_release(pl->prog, > > > > + atype); > > > atype could be 0 (CGROUP_INET_INGRESS) here. bpf_cgroup_lsm_shim_release() > > > above will go ahead with bpf_trampoline_unlink_cgroup_shim(). > > > It will break some of the assumptions. e.g. prog->aux->attach_btf is NULL > > > for CGROUP_INET_INGRESS. > > > > > > Instead, only call bpf_cgroup_lsm_shim_release() for BPF_LSM_CGROUP ? > > > > > > If the above observation is sane, I wonder if the existing test_progs > > > have uncovered it or may be the existing tests just always detach > > > cleanly itself before cleaning the cgroup which then avoided this case. > > > > Might be what's happening here: > > > > https://github.com/kernel-patches/bpf/runs/5876983908?check_suite_focus=true > hmm.... this one looks different. I am thinking the oops should happen > in bpf_obj_id() which is not inlined. didn't ring any bell for now > after a quick look, so yeah let's fix the known first. > > > > > Although, I'm not sure why it's z15 only. Good point on filtering by > > BPF_LSM_CGROUP, will do. > > > > > > bpf_prog_put(pl->prog); > > > > - if (pl->link) > > > > + } > > > > + if (pl->link) { > > > > + bpf_cgroup_lsm_shim_release(pl->link->link.prog, > > > > + atype); > > > > bpf_cgroup_link_auto_detach(pl->link); > > > > + } > > > > kfree(pl); > > > > static_branch_dec(&cgroup_bpf_enabled_key[atype]); > > > > } > > [ ... ] > > > > > +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, > > > > + struct bpf_attach_target_info *tgt_info) > > > > +{ > > > > + struct bpf_prog *shim_prog = NULL; > > > > + struct bpf_trampoline *tr; > > > > + bpf_func_t bpf_func; > > > > + u64 key; > > > > + int err; > > > > + > > > > + key = bpf_trampoline_compute_key(NULL, prog->aux->attach_btf, > > > > + prog->aux->attach_btf_id); > > > > + > > > > + err = bpf_lsm_find_cgroup_shim(prog, &bpf_func); > > > > + if (err) > > > > + return err; > > > > + > > > > + tr = bpf_trampoline_get(key, tgt_info); > > > > + if (!tr) > > > > + return -ENOMEM; > > > > + > > > > + mutex_lock(&tr->mutex); > > > > + > > > > + shim_prog = cgroup_shim_find(tr, bpf_func); > > > > + if (shim_prog) { > > > > + /* Reusing existing shim attached by the other program. > > > > + */ > > > The shim_prog is reused by >1 BPF_LSM_CGROUP progs and > > > shim_prog is hidden from the userspace also (no id), so it may worth > > > to bring this up: > > > > > > In __bpf_prog_enter(), other than some bpf stats of the shim_prog > > > will become useless which is a very minor thing, it is also checking > > > shim_prog->active and bump the misses counter. Now, the misses counter > > > is no longer visible to users. Since it is actually running the cgroup prog, > > > may be there is no need for the active check ? > > > > Agree that the active counter will probably be taken care of when the > > actual program runs; > iirc, the BPF_PROG_RUN_ARRAY_CG does not need the active counter. > > > but now sure it worth the effort in trying to > > remove it here? > I was thinking if the active counter got triggered and missed calling the > BPF_LSM_CGROUP, then there is no way to tell this case got hit without > exposing the stats of the shim_prog and it could be a pretty hard > problem to chase. It probably won't be an issue for non-sleepable now > if the rcu_read_lock() maps to preempt_disable(). Not sure about the > future sleepable case. > > I am thinking to avoid doing all the active count and stats count > in __bpf_prog_enter() and __bpf_prog_exit() for BPF_LSM_CGROUP. afaik, > only the rcu_read_lock and rcu_read_unlock are useful to protect > the shim_prog itself. May be a __bpf_nostats_enter() and > __bpf_nostats_exit(). SG, let me try to skip that for BPF_LSM_CGROUP case. > > Regarding "no longer visible to users": that's a good point. Should I > > actually add those shim progs to the prog_idr? Or just hide it as > > "internal implementation detail"? > Then no need to expose the shim_progs to the idr. > > ~~~~ > [ btw, while thinking the shim_prog, I also think there is no need for one > shim_prog for each attach_btf_id which is essentially > prog->aux->cgroup_atype. The static prog->aux->cgroup_atype can be > passed in the stack when preparing the trampoline. > just an idea and not suggesting must be done now. This can be > optimized later since it does not affect the API. ] Ack, I guess in theory, there needs to be only two "global" shim_progs, one for 'struct socket' and another for 'current' (or more, for other types). I went with allocating an instance per trampoline to avoid having that global state. Working under tr->mutex simplifies things a bit imo, but, as you said, we can optimize here if needed.