On Thu, Mar 26, 2020 at 4:35 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Mar 24, 2020 at 11:57:44PM -0700, Andrii Nakryiko wrote: > > > > +/* Swap updated BPF program for given link in effective program arrays across > > + * all descendant cgroups. This function is guaranteed to succeed. > > + */ > > +static void replace_effective_prog(struct cgroup *cgrp, > > + enum bpf_attach_type type, > > + struct bpf_cgroup_link *link) > > +{ > > + struct bpf_prog_array_item *item; > > + struct cgroup_subsys_state *css; > > + struct bpf_prog_array *progs; > > + struct bpf_prog_list *pl; > > + struct list_head *head; > > + struct cgroup *cg; > > + int pos; > > + > > + css_for_each_descendant_pre(css, &cgrp->self) { > > + struct cgroup *desc = container_of(css, struct cgroup, self); > > + > > + if (percpu_ref_is_zero(&desc->bpf.refcnt)) > > + continue; > > + > > + /* found position of link in effective progs array */ > > + for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) { > > + if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI)) > > + continue; > > + > > + head = &cg->bpf.progs[type]; > > + list_for_each_entry(pl, head, node) { > > + if (!prog_list_prog(pl)) > > + continue; > > + if (pl->link == link) > > + goto found; > > + pos++; > > + } > > + } > > +found: > > + BUG_ON(!cg); > > + progs = rcu_dereference_protected( > > + desc->bpf.effective[type], > > + lockdep_is_held(&cgroup_mutex)); > > + item = &progs->items[pos]; > > + WRITE_ONCE(item->prog, link->link.prog); > > + } > > +} > > + > > +/** > > + * __cgroup_bpf_replace() - Replace link's program and propagate the change > > + * to descendants > > + * @cgrp: The cgroup which descendants to traverse > > + * @link: A link for which to replace BPF program > > + * @type: Type of attach operation > > + * > > + * Must be called with cgroup_mutex held. > > + */ > > +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link, > > + struct bpf_prog *new_prog) > > +{ > > + struct list_head *progs = &cgrp->bpf.progs[link->type]; > > + struct bpf_prog *old_prog; > > + struct bpf_prog_list *pl; > > + bool found = false; > > + > > + if (link->link.prog->type != new_prog->type) > > + return -EINVAL; > > + > > + list_for_each_entry(pl, progs, node) { > > + if (pl->link == link) { > > + found = true; > > + break; > > + } > > + } > > + if (!found) > > + return -ENOENT; > > + > > + old_prog = xchg(&link->link.prog, new_prog); > > + replace_effective_prog(cgrp, link->type, link); > > I think with 'found = true' in this function you're assuming that it will be > found in replace_effective_prog() ? I don't think that's the case. > Try to create bpf_link with BPF_F_ALLOW_OVERRIDE, override it in a child cgroup > with another link and then try to LINK_UPDATE the former. The link is there, > but the prog is not executing and it's not in effective array. What LINK_UPDATE > suppose to do? I guess it should succeed? Yes, this is a great catch! I should have used ALLOW_OVERRIDE at the root cgroup level in my selftest, it would catch it immediately. BUG_ON(!cg) in replace_effective_prog() is too aggressive, if I replace it with `if (!cg) continue;` it will handle this as well. > Even trickier that the prog will be in effective array in some of > css_for_each_descendant_pre() and not in others. This cgroup attach semantics > were convoluted from the day one. Apparently people use all three variants now, > but I wouldn't bet that everyone understands it. Agree about convoluted logic, spent enormous time understanding and modifying it :) But apart from BUG_ON(!cg) problem, everything works because each level of hierarchy is treated independently in replace_effective_prog(). Search for attached link on each level is reset and performed anew. If found - we replace program, if not - must be ALLOW_OVERRIDE case, i.e., actually overridden link. > Hence my proposal to support F_ALLOW_MULTI for links only. At least initially. > It's so much simpler to explain. And owning bpf_link will guarantee that the > prog is executing (unless cgroup is removed and sockets are closed). I guess > default (no-override) is acceptable to bpf_link as well and in that sense it > will be very similar to XDP with single prog attached. So I think I can live > with default and ALLOW_MULTI for now. But we should probably redesign > overriding capabilities. Folks need to attach multiple progs to a given cgroup > and disallow all progs in children. Currently it's not possible to do, since > MULTI in the parent allows at least one (default, override or multi) in the > children. bpf_link inheriting this logic won't help to solve this use case. It > feels that link should stay as multi only and override or not in the children > should be a separate property. Probably not related to link at all. It fits > better as a cgroup permission. Yeah, we had a brief discussion with Andrey on mailing list. Not sure what the solution looks like, but it should be orthogonal to link/prog attachment operation, probably. If you insist and Andrey is ok with dropping ALLOW_OVERRIDE, it's easy. But fixing the logic to handle it is also easy. So are we sure about supporting 2 out of 3 existing modes? :) > > Anyhow I'm going to apply patches 1 and 2, since they are good cleanup > regardless of what we decide here. Thanks, will rebase on top of bpf-next master for v3.