Re: [PATCH 0/9 RFC] cgroup: separate rstat trees

Shakeel Butt <shakeel.butt@xxxxxxxxx> · Thu, 16 Jan 2025 11:03:04 -0800

Hi Michal,

On Thu, Jan 16, 2025 at 04:19:07PM +0100, Michal Koutný wrote:
> Hello.
> 
> On Mon, Jan 13, 2025 at 10:25:34AM -0800, Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
> > > and flushing efffectiveness depends on how individual readers are
> > > correlated, 
> > 
> > Sorry I am confused by the above statement, can you please expand on
> > what you meant by it?
> > 
> > > OTOH writer correlation affects
> > > updaters when extending the update tree.
> > 
> > Here I am confused about the difference between writer and updater.
> 
> reader -- a call site that'd need to call cgroup_rstat_flush() to
> 	calculate aggregated stats
> writer (or updater) -- a call site that calls cgroup_rstat_updated()
> 	when it modifies whatever datum
>

Ah so writer and updater are same.

> By correlated readers I meant that stats for multiple controllers are
> read close to each other (time-wise). First such a reader does the heavy
> lifting, consequent readers enjoy quick access.
> (With per-controller flushing, each reader would need to do the flush
> and I'm suspecting the total time non-linear wrt parts.)

This is a good point and actually in prod we are observing machines with
very active workloads, the close readers of different stats are flushing
all the subsystems even when reads are very close-by time-wise. In
addition there are users who are only reading non-memory stats and still
paying the cost of memory flushing. Please note that memory stats
flushing is the most expensive one at the moment and it does not make
sense to do flush memory stats for above two cases.

> 
> Similarly for writers, if multiple controller's data change in short
> window, only the first one has to construct the rstat tree from top down
> to self, the other are updating the same tree.

Another good point. From my observation, the cost of rstat tree
insertion is very cheap and the cost get amortized i.e. a lot of updates
within flushes such that the insertion cost is not noticeable at all, at
least in the perf traces.

> 
> > In-kernel memcg stats readers will be unaffected most of the time with
> > this change. The only difference will be when they flush, they will only
> > flush memcg stats.
> 
> That "most of the time" is what depends on how other controller's
> readers are active.
> 
> > Here I am assuming you meant measurements in terms of cpu cost or do you
> > have something else in mind?
> 
> I have in mind something like Tejun's point 2:
> | 2. It has noticeable benefits in the targeted use cases.
> 
> The cover letter mentions some old problems (which may not be problems
> nowadays with memcg flushing reworks) and it's not clear how the
> separation into per-controller trees impacts (today's) problems.
> 
> (I can imagine if the problem is stated like: io.stat readers are
> unnecessarily waiting for memory.stat flushing, the benefit can be shown
> (unless io.stat readers could benefit from flushing triggered by e.g.
> memory).  But I didn't get if _that_ is the problem.)
> 

The cover letter of v2 has more information on the motivation. The
main motivation is the same I descrived above i.e. many applications
(stat monitors) has to flush all the subsystem even when they are
reading different subsystem stats close-by and then there are
applications who are reading just stats of cpu or io and still have to
pay for memcg stat flushing. This series is targeting these two very
common scenarios.

Thanks Michal for your time and comments.

Shakeel