Re: [PATCH bpf-next v9 1/5] bpf: Introduce cgroup iter

Hao Luo <haoluo@xxxxxxxxxx> · Thu, 25 Aug 2022 10:58:26 -0700

On Thu, Aug 25, 2022 at 8:24 AM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> Hello.
>
> On Tue, Aug 23, 2022 at 08:00:27PM -0700, Hao Luo <haoluo@xxxxxxxxxx> wrote:
> > +static int bpf_iter_attach_cgroup(struct bpf_prog *prog,
> > +                               union bpf_iter_link_info *linfo,
> > +                               struct bpf_iter_aux_info *aux)
> > +{
> > +     int fd = linfo->cgroup.cgroup_fd;
> > +     u64 id = linfo->cgroup.cgroup_id;
> > +     int order = linfo->cgroup.order;
> > +     struct cgroup *cgrp;
> > +
> > +     if (order != BPF_ITER_DESCENDANTS_PRE &&
> > +         order != BPF_ITER_DESCENDANTS_POST &&
> > +         order != BPF_ITER_ANCESTORS_UP &&
> > +         order != BPF_ITER_SELF_ONLY)
> > +             return -EINVAL;
> > +
> > +     if (fd && id)
> > +             return -EINVAL;
> > +
> > +     if (fd)
> > +             cgrp = cgroup_get_from_fd(fd);
> > +     else if (id)
> > +             cgrp = cgroup_get_from_id(id);
> > +     else /* walk the entire hierarchy by default. */
> > +             cgrp = cgroup_get_from_path("/");
> > +
> > +     if (IS_ERR(cgrp))
> > +             return PTR_ERR(cgrp);
>
> This section caught my eye.
>
> Perhaps the simpler way for the default hierachy fallback would be
>
>                 cgrp = &cgrp_dfl_root.cgrp;
>                 cgroup_get(cgroup)
>
> But maybe it's not what is the intention if cgroup NS should be taken
> into account and cgroup_get_from_path() is buggy in this regard.
>
> Would it make sense to prepend the patch below to your series?

Yep. Being able to take cgroup NS into account is my intention here.
Right now, I imagine the control plane who uses this default option is
the system daemon which lives in the init cgroup ns. They could always
specify a particular cgroup using ID or FD. Printing the whole
hierarchy is something a system daemon would do. Anyhow, can I add the
appended patch if there is going to be a v10 of this patch series? If
v9 is ok to merge, I can send the appended patch separately. Does that
sound good? The appended patch looks good to me, thanks!

>
> Also, that makes me think about iter initialization with ID. In contrast
> with FD passing (that's subject to some permissions and NS checks), the
> retrieval via ID is not equipped with that, ids are not unguessable and
> I'd consider cgroup IDs an implementation detail.
>
> So, is the ID initialization that much useful? (I have no idea about
> permissions model of BPF here, so it might be just fine but still it'd
> be good to take cgroup NS into account. Likely for BPF_ITER_ANCESTORS_UP
> too.)
>

Permission is a valid point about FD. There was discussion in an
earlier version of this patch series [0]. The good thing about ID is
that it can be passed across processes and it's meaningful to appear
in logs. It's more user-friendly. So we decided to support both.

[0] https://lore.kernel.org/netdev/YuK+eg3lgwJ2CJnJ@xxxxxxxxxxxxxxx/

> HTH,
> Michal
>
> ----8<----
> From 1098e60e89d4d901b7eef04e531f2c889309a91b Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Michal=20Koutn=C3=BD?= <mkoutny@xxxxxxxx>
> Date: Thu, 25 Aug 2022 15:19:04 +0200
> Subject: [PATCH] cgroup: Honor caller's cgroup NS when resolving path
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> cgroup_get_from_path() is not widely used function. Its callers presume
> the path is resolved under cgroup namespace. (There is one caller
> currently and resolving in init NS won't make harm (netfilter). However,
> future users may be subject to different effects when resolving
> globally.)
> Since, there's currently no use for the global resolution, modify the
> existing function to take cgroup NS into account.
>
> Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
> Signed-off-by: Michal Koutný <mkoutny@xxxxxxxx>
> ---
>  kernel/cgroup/cgroup.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index ffaccd6373f1..9280f4b41d8b 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6603,8 +6603,12 @@ struct cgroup *cgroup_get_from_path(const char *path)
>  {
>         struct kernfs_node *kn;
>         struct cgroup *cgrp = ERR_PTR(-ENOENT);
> +       struct cgroup *root_cgrp;
>
> -       kn = kernfs_walk_and_get(cgrp_dfl_root.cgrp.kn, path);
> +       spin_lock_irq(&css_set_lock);
> +       root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
> +       kn = kernfs_walk_and_get(root_cgrp->kn, path);
> +       spin_unlock_irq(&css_set_lock);
>         if (!kn)
>                 goto out;
>
>
> base-commit: 3cc40a443a04d52b0c95255dce264068b01e9bfe
> --
> 2.37.0
>