On Tue, Jul 12, 2022 at 9:32 AM Tejun Heo <tj@xxxxxxxxxx> wrote: > > Hello, > > On Tue, Jul 12, 2022 at 08:25:24AM -0700, Shakeel Butt wrote: > > Another very obvious example is the filesystem shared between multiple > > jobs. We had a similar discussion [1] on LRU reparenting patch series. > > Hmm... if I'm understanding correctly, what's discussed in [1] can be solved > with proper reparenting and nesting, right? > To some extent i.e. the zombies will go away but the accounting/stats of the sub-jobs will be nondeterministic until all the possible shared stuff is reparented. Let me give a more concrete example below. > > For this use-case internally we have a memcg= mount option where the > > given memcg is the common ancestor (think of pod in k8s environment) > > of the jobs who are sharing the filesystem. > > Can you elaborate a bit more on this? We've never really supported correctly > accounting pages shared across cgroups because it can be very complicating > and the use cases aren't that wide-spread. What's being shared? How big is > the shared portion in relation to total memory usage? What's the cgroup > topology like? > One use-case we have is a build & test service which runs independent builds and tests but all the build utilities (compiler, linker, libraries) are shared between those builds and tests. In terms of topology, the service has a top level cgroup (P) and all independent builds and tests run in their own cgroup under P. These builds/tests continuously come and go. This service continuously monitors all the builds/tests running and may kill some based on some criteria which includes memory usage. However the memory usage is nondeterministic and killing a specific build/test may not really free memory if most of the memory charged to it is from shared build utilities.