Re: [PATCH bpf-next v1 3/5] bpf: Introduce cgroup iter

Hao Luo <haoluo@xxxxxxxxxx> · Fri, 20 May 2022 10:30:30 -0700

Hi Tejun,

On Fri, May 20, 2022 at 1:11 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Fri, May 20, 2022 at 12:58:52AM -0700, Yosry Ahmed wrote:
> > On Fri, May 20, 2022 at 12:41 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
> > >
> > > On Fri, May 20, 2022 at 01:21:31AM +0000, Yosry Ahmed wrote:
> > > > From: Hao Luo <haoluo@xxxxxxxxxx>
> > > >
> > > > Introduce a new type of iter prog: cgroup. Unlike other bpf_iter, this
> > > > iter doesn't iterate a set of kernel objects. Instead, it is supposed to
> > > > be parameterized by a cgroup id and prints only that cgroup. So one
> > > > needs to specify a target cgroup id when attaching this iter. The target
> > > > cgroup's state can be read out via a link of this iter.
> > > >
> > > > Signed-off-by: Hao Luo <haoluo@xxxxxxxxxx>
> > > > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> > >
> > > This could be me not understanding why it's structured this way but it keeps
> > > bothering me that this is adding a cgroup iterator which doesn't iterate
> > > cgroups. If all that's needed is extracting information from a specific
> > > cgroup, why does this need to be an iterator? e.g. why can't I use
> > > BPF_PROG_TEST_RUN which looks up the cgroup with the provided ID, flushes
> > > rstat, retrieves whatever information necessary and returns that as the
> > > result?
> >
> > I will let Hao and Yonghong reply here as they have a lot more
> > context, and they had previous discussions about cgroup_iter. I just
> > want to say that exposing the stats in a file is extremely convenient
> > for userspace apps. It becomes very similar to reading stats from
> > cgroupfs. It also makes migrating cgroup stats that we have
> > implemented in the kernel to BPF a lot easier.
>
> So, if it were upto me, I'd rather direct energy towards making retrieving
> information through TEST_RUN_PROG easier rather than clinging to making
> kernel output text. I get that text interface is familiar but it kinda
> sucks in many ways.
>

Tejun, could you explain more about the downside of text interfaces
and why TEST_RUN_PROG would address the problems in text output? From
the discussion we had last time, I understand that your concern was
the unstable interface if we introduce bpf files in cgroupfs, so we
are moving toward replicating the directory structure in bpffs. But I
am not sure about the issue of text format output

> > AFAIK there are also discussions about using overlayfs to have links
> > to the bpffs files in cgroupfs, which makes it even better. So I would
> > really prefer keeping the approach we have here of reading stats
> > through a file from userspace. As for how we go about this (and why a
> > cgroup iterator doesn't iterate cgroups) I will leave this for Hao and
> > Yonghong to explain the rationale behind it. Ideally we can keep the
> > same functionality under a more descriptive name/type.
>
> My answer would be the same here. You guys seem dead set on making the
> kernel emulate cgroup1. I'm not gonna explicitly block that but would
> strongly suggest having a longer term view.
>

The reason why Yosry and I are still pushing toward this direction is
that our user space applications rely heavily on extracting
information from text output for cgroups. Please understand that
migrating them from the traditional model to a new model is a bigger
pain. But I agree that if we have a better, concrete solution (for
example, maybe TEST_RUN_PROG) to convince them and help them migrate,
I really would love to contribute and work on it.

> If you *must* do the iterator, can you at least make it a proper iterator
> which supports seeking? AFAICS there's nothing fundamentally preventing bpf
> iterators from supporting seeking. Or is it that you need something which is
> pinned to a cgroup so that you can emulate the directory structure?
>

Yonghong may comment on adding seek for bpf_iter. I would love to
contribute if we are in need of that. Right now, we don't have a use
case that needs seek for bpf_iter, I think. My thought: for cgroups,
we can seek using cgroup id. Maybe, not all kernel objects are
indexable, so seeking doesn't apply there?

Hao