Re: [PATCH 0/9 RFC] cgroup: separate rstat trees

JP Kobryn <inwardvessel@xxxxxxxxx> · Tue, 14 Jan 2025 17:33:17 -0800

Hi Michal,

On 1/13/25 10:25 AM, Shakeel Butt wrote:
On Wed, Jan 08, 2025 at 07:16:47PM +0100, Michal Koutný wrote:
Hello JP.

On Mon, Dec 23, 2024 at 05:13:53PM -0800, JP Kobryn <inwardvessel@xxxxxxxxx> wrote:
I've been experimenting with these changes to allow for separate
updating/flushing of cgroup stats per-subsystem.

Nice.

I reached a point where this started to feel stable in my local testing, so I
wanted to share and get feedback on this approach.

The split is not straight-forwardly an improvement --

The major improvement in my opinion is the performance isolation for
stats readers i.e. cpu stats readers do not need to flush memory stats.

there's at least
higher memory footprint

Yes this is indeed the case and JP, can you please give a ballmark on
the memory overhead?

Yes, the trade-off is using more memory to allow for separate trees.
With these patches the changes in allocated memory for the 
cgroup_rstat_cpu instances and their associated locks are:
static
	reduced by 58%
dynamic
	increased by 344%

The threefold increase on the dynamic side is attributed to now having 3 
rstat trees per cgroup (1 for base stats, 1 for memory, 1 for io), 
instead of originally just 1. The number will change if more subsystems 
start or stop using rstat in the future. Feel free to let me know if you 
would like to see the detailed breakdown of these values.

and flushing efffectiveness depends on how
individual readers are correlated,

Sorry I am confused by the above statement, can you please expand on
what you meant by it?

OTOH writer correlation affects
updaters when extending the update tree.

Here I am confused about the difference between writer and updater.

So a workload dependent effect
can go (in my theory) both sides.
There are also in-kernel consumers of stats, namely memory controller
that's been optimized over the years to balance the tradeoff between
precision and latency.

In-kernel memcg stats readers will be unaffected most of the time with
this change. The only difference will be when they flush, they will only
flush memcg stats.

So do you have any measurements (or expectations) that show how readers
or writers are affected?

Here I am assuming you meant measurements in terms of cpu cost or do you
have something else in mind?

Thanks a lot Michal for taking a look.