Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control

Gregory Price <gregory.price@xxxxxxxxxxxx> · Fri, 10 Nov 2023 17:29:25 -0500

On Fri, Nov 10, 2023 at 12:05:59PM -1000, tj@xxxxxxxxxx wrote:
> Hello,
> 
> On Thu, Nov 09, 2023 at 10:48:56PM +0000, John Groves wrote:
> > This approach checks all the important boxes: it only applies to apps where
> > it's enabled, the weighting can vary from one app to another, the
> > kernel is not affected, and the numa topology is not buried.
> 
> Can't it be a mempol property which is inherited by child processes? Then
> all you'll need is e.g. adding systemd support to configure this at service
> unit level. I'm having a bit of hard time seeing why this needs to be a
> cgroup feature when it doesn't involve dynamic resource accounting /
> enforcement at all.
> 
> Thanks.
> 
> -- 
> tejun

I did originally implement it this way, but note that it will either
require some creative extension of set_mempolicy or even set_mempolicy2
as proposed here:

https://lore.kernel.org/all/20231003002156.740595-1-gregory.price@xxxxxxxxxxxx/

One of the problems to consider is task migration.  If a task is
migrated from one socket to another, for example by being moved to a new
cgroup with a different cpuset - the weights might be completely nonsensical
for the new allowed topology.

Unfortunately mpol has no way of being changed from outside the task
itself once it's applied, other than changing its nodemasks via cpusets.

So one concrete use case: kubernetes might like change cpusets or move
tasks from one cgroup to another, or a vm might be migrated from one set
of nodes to enother (technically not mutually exclusive here).  Some
memory policy settings (like weights) may no longer apply when this
happens, so it would be preferable to have a way to change them.

~Gregory