Re: [RFC PATCH bpf-next 1/9] bpf: introduce CGROUP_SUBSYS_RSTAT program type

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Tue, 10 May 2022 13:43:46 -0700

On Tue, May 10, 2022 at 12:59 PM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Tue, May 10, 2022 at 12:34:42PM -0700, Yosry Ahmed wrote:
> > The rationale behind associating this work with cgroup_subsys is that
> > usually the stats are associated with a resource (e.g. memory, cpu,
> > etc). For example, if the memory controller is only enabled for a
> > subtree in a big hierarchy, it would be more efficient to only run BPF
> > rstat programs for those cgroups, not the entire hierarchy. It
> > provides a way to control what part of the hierarchy you want to
> > collect stats for. This is also semantically similar to the
> > css_rstat_flush() callback.
>
> Hmm... one major point of rstat is not having to worry about these things
> because we iterate what's been active rather than what exists. Now, this
> isn't entirely true because we share the same updated list for all sources.
> This is a trade-off which makes sense because 1. the number of cgroups to
> iterate each cycle is generally really low anyway 2. different controllers
> often get enabled together. If the balance tilts towards "we're walking too
> many due to the sharing of updated list across different sources", the
> solution would be splitting the updated list so that we make the walk finer
> grained.
>
> Note that the above doesn't really affect the conceptual model. It's purely
> an optimization decision. Tying these things to a cgroup_subsys does affect
> the conceptual model and, in this case, the userland API for a performance
> consideration which can be solved otherwise.
>
> So, let's please keep this simple and in the (unlikely) case that the
> overhead becomes an issue, solve it from rstat operation side.
>
> Thanks.

I assume if we do this optimization, and have separate updated lists
for controllers, we will still have a "core" updated list that is not
tied to any controller. Is this correct?

If yes, then we can make the interface controller-agnostic (a global
list of BPF flushers). If we do the optimization later, we tie BPF
stats to the "core" updated list. We can even extend the userland
interface then to allow for controller-specific BPF stats if found
useful.

If not, and there will only be controller-specific updated lists then,
then we might need to maintain a "core" updated list just for the sake
of BPF programs, which I don't think would be favorable.

What do you think? Either-way, I will try to document our discussion
outcome in the commit message (and maybe the code), so that
if-and-when this optimization is made, we can come back to it.

>
> --
> tejun