Re: [RFC bpf-next] Hierarchical Cgroup Stats Collection Using BPF

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Wed, 16 Mar 2022 09:35:05 -0700

Hi Tejun,

Thanks for taking the time to read my proposal! Sorry for the late
reply. This email skipped my inbox for some reason.

On Sun, Mar 13, 2022 at 10:35 PM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Wed, Mar 09, 2022 at 12:27:15PM -0800, Yosry Ahmed wrote:
> ...
> > These problems are already addressed by the rstat aggregation
> > mechanism in the kernel, which is primarily used for memcg stats. We
>
> Not that it matters all that much but I don't think the above statement is
> true given that sched stats are an integrated part of the rstat
> implementation and io was converted before memcg.
>

Excuse my ignorance, I am new to kernel development. I only saw calls
to cgroup_rstat_updated() in memcg and io and assumed they were the
only users. Now I found cpu_account_cputime() :)

> > - For every cgroup, we will either use flags to distinguish BPF stats
> > updates from normal stats updates, or flush both anyway (memcg stats
> > are periodically flushed anyway).
>
> I'd just keep them together. Usually most activities tend to happen
> together, so it's cheaper to aggregate all of them in one go in most cases.

This makes sense to me, thanks.

>
> > - Provide flags to enable/disable using per-cpu arrays (for stats that
> > are not updated frequently), and enable/disable hierarchical
> > aggregation (for non-hierarchical stats, they can still make benefit
> > of the automatic entries creation & deletion).
> > - Provide different hierarchical aggregation operations : SUM, MAX, MIN, etc.
> > - Instead of an array as the map value, use a struct, and let the user
> > provide an aggregator function in the form of a BPF program.
>
> I'm more partial to the last option. It does make the usage a bit more
> compilcated but hopefully it shouldn't be too bad with good examples.
>
> I don't have strong opinions on the bpf side of things but it'd be great to
> be able to use rstat from bpf.

It indeed gives more flexibility but is more complicated. Also, I am
not sure about the overhead to make calls to BPF programs in every
aggregation step. Looking forward to get feedback on the bpf side of
things.

>
> Thanks.
>
> --
> tejun