Re: [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() on cgroup1

Yafang Shao <laoar.shao@xxxxxxxxx> · Thu, 7 Sep 2023 11:05:07 +0800

On Thu, Sep 7, 2023 at 4:13 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao wrote:
> >  static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
> >                                              struct cgroup *ancestor)
> >  {
> >       struct css_set *cset = task_css_set(task);
> > +     struct cgroup *cgrp;
> > +     bool ret = false;
> > +     int ssid;
> > +
> > +     if (ancestor->root == &cgrp_dfl_root)
> > +             return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> > +
> > +     for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
> > +             if (!ancestor->subsys[ssid])
> > +                     continue;
> >
> > -     return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> > +             cgrp = task_css(task, ssid)->cgroup;
> > +             if (!cgrp)
> > +                     continue;
> > +
> > +             if (!cgroup_is_descendant(cgrp, ancestor))
> > +                     return false;
> > +             if (!ret)
> > +                     ret = true;
> > +     }
> > +     return ret;
>
> I feel ambivalent about adding support for this in cgroup1 especially given
> that this can only work for fd based interface which is worse than the ID
> based ones.

The fd-based cgroup interface plays a crucial role in BPF programs,
particularly in components such as cgroup_iter, bpf_cgrp_storage, and
cgroup_array maps, as well as in the attachment and detachment of
cgroups.

However, it's important to note that as far as my knowledge goes,
bpf_cgrp_storage, cgroup_array, and the attachment/detachment of
cgroups are exclusively compatible with the cgroup fd-based interface.
Unfortunately, all these functionalities are limited to cgroup2, which
poses challenges in containerized environments.

In our pursuit of enabling seamless BPF integration within our
Kubernetes environment, we've been exploring the possibility of
transitioning from cgroup1 to cgroup2. This transition, while
desirable for its future-forward nature, presents complexities due to
the need for numerous applications to adapt.

We acknowledge that cgroup2 represents the future, but we also
understand that such transitions require time and effort.
Consequently, we are considering an alternative approach. Rather than
migrating to cgroup2, we are contemplating modifications to the BPF
kernel code to ensure compatibility with cgroup1. Moreover, it appears
that these modifications may entail only minor adjustments, making
this option more palatable.

> Even if we're doing this, the above is definitely not what we
> want to do as it won't work for controller-less hierarchies like the one
> that systemd used to use.

Right. It can't work for /sys/fs/cgroup/systemd/.

> You'd have to lock css_set_lock and walk the
> cgpr_cset_links.

That seems better. Will investigate it.  Thanks for your suggestion.

--
Regards
Yafang