Hello, Gregory. On Fri, Nov 10, 2023 at 05:29:25PM -0500, Gregory Price wrote: > I did originally implement it this way, but note that it will either > require some creative extension of set_mempolicy or even set_mempolicy2 > as proposed here: > > https://lore.kernel.org/all/20231003002156.740595-1-gregory.price@xxxxxxxxxxxx/ > > One of the problems to consider is task migration. If a task is > migrated from one socket to another, for example by being moved to a new > cgroup with a different cpuset - the weights might be completely nonsensical > for the new allowed topology. > > Unfortunately mpol has no way of being changed from outside the task > itself once it's applied, other than changing its nodemasks via cpusets. Maybe it's time to add one? > So one concrete use case: kubernetes might like change cpusets or move > tasks from one cgroup to another, or a vm might be migrated from one set > of nodes to enother (technically not mutually exclusive here). Some > memory policy settings (like weights) may no longer apply when this > happens, so it would be preferable to have a way to change them. Neither covers all use cases. As you noted in your mempolicy message, if the application wants finer grained control, cgroup interface isn't great. In general, any changes which are dynamically initiated by the application itself isn't a great fit for cgroup. I'm generally pretty awry of adding non-resource group configuration interface especially when they don't have counter part in the regular per-process/thread API for a few reasons: 1. The reason why people try to add those through cgroup somtimes is because it seems easier to add those new features through cgroup, which may be true to some degree, but shortcuts often aren't very conducive to long term maintainability. 2. As noted above, just having cgroup often excludes a signficant portion of use cases. Not all systems enable cgroups and programatic accesses from target processes / threads are coarse-grained and can be really awakward. 3. Cgroup can be convenient when group config change is necessary. However, we really don't want to keep adding kernel interface just for changing configs for a group of threads. For config changes which aren't high frequency, userspace iterating the member processes and applying the changes if possible is usually good enough which usually involves looping until no new process is found. If the looping is problematic, cgroup freezer can be used to atomically stop all member threads to provide atomicity too. Thanks. -- tejun