Re: [PATCH 0/9 RFC] cgroup: separate rstat trees

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Thu, 16 Jan 2025 07:35:25 -0800

On Thu, Jan 16, 2025 at 7:19 AM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> Hello.
>
> On Mon, Jan 13, 2025 at 10:25:34AM -0800, Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
> > > and flushing efffectiveness depends on how individual readers are
> > > correlated,
> >
> > Sorry I am confused by the above statement, can you please expand on
> > what you meant by it?
> >
> > > OTOH writer correlation affects
> > > updaters when extending the update tree.
> >
> > Here I am confused about the difference between writer and updater.
>
> reader -- a call site that'd need to call cgroup_rstat_flush() to
>         calculate aggregated stats
> writer (or updater) -- a call site that calls cgroup_rstat_updated()
>         when it modifies whatever datum
>
> By correlated readers I meant that stats for multiple controllers are
> read close to each other (time-wise). First such a reader does the heavy
> lifting, consequent readers enjoy quick access.
> (With per-controller flushing, each reader would need to do the flush
> and I'm suspecting the total time non-linear wrt parts.)

In this case, I actually think it's better if every reader pays for
the flush they asked for (and only that). There is a bit of repeated
work if we read memory stats then io stats right after, but in cases
where we don't, paying to flush all subsystems because they are likely
to be flushed soon is not necessarily a good thing imo.

>
> Similarly for writers, if multiple controller's data change in short
> window, only the first one has to construct the rstat tree from top down
> to self, the other are updating the same tree.

This I agree about. If we have consecutive updates from two different
subsystems to the same cgroup, almost all the work is repeated.
Whether that causes a tangible performance difference or not is
something the numbers should show. In my experience, real regressions
on the update side are usually caught by LKP and are somewhat easy to
surface in benchmarks (I used netperf in the past).

>
> > In-kernel memcg stats readers will be unaffected most of the time with
> > this change. The only difference will be when they flush, they will only
> > flush memcg stats.
>
> That "most of the time" is what depends on how other controller's
> readers are active.

Since readers of other controllers are only in userspace (AFAICT), I
think it's unlikely that they are correlated with in-kernel memcg stat
readers in general.

>
> > Here I am assuming you meant measurements in terms of cpu cost or do you
> > have something else in mind?
>
> I have in mind something like Tejun's point 2:
> | 2. It has noticeable benefits in the targeted use cases.
>
> The cover letter mentions some old problems (which may not be problems
> nowadays with memcg flushing reworks) and it's not clear how the
> separation into per-controller trees impacts (today's) problems.
>
> (I can imagine if the problem is stated like: io.stat readers are
> unnecessarily waiting for memory.stat flushing, the benefit can be shown
> (unless io.stat readers could benefit from flushing triggered by e.g.
> memory).  But I didn't get if _that_ is the problem.)

Yeah I hope/expect that numbers will show that reading memcg stats (in
userspace or the kernel) becomes a bit faster, while reading other
subsystem stats should be significantly faster (at least in some
cases). We will see how that turns out.