Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.

Shakeel Butt <shakeelb@xxxxxxxxxx> · Tue, 12 Jul 2022 10:26:22 -0700

On Tue, Jul 12, 2022 at 9:32 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Tue, Jul 12, 2022 at 08:25:24AM -0700, Shakeel Butt wrote:
> > Another very obvious example is the filesystem shared between multiple
> > jobs. We had a similar discussion [1] on LRU reparenting patch series.
>
> Hmm... if I'm understanding correctly, what's discussed in [1] can be solved
> with proper reparenting and nesting, right?
>

To some extent i.e. the zombies will go away but the accounting/stats
of the sub-jobs will be nondeterministic until all the possible shared
stuff is reparented. Let me give a more concrete example below.

> > For this use-case internally we have a memcg= mount option where the
> > given memcg is the common ancestor (think of pod in k8s environment)
> > of the jobs who are sharing the filesystem.
>
> Can you elaborate a bit more on this? We've never really supported correctly
> accounting pages shared across cgroups because it can be very complicating
> and the use cases aren't that wide-spread. What's being shared? How big is
> the shared portion in relation to total memory usage? What's the cgroup
> topology like?
>

One use-case we have is a build & test service which runs independent
builds and tests but all the build utilities (compiler, linker,
libraries) are shared between those builds and tests.

In terms of topology, the service has a top level cgroup (P) and all
independent builds and tests run in their own cgroup under P. These
builds/tests continuously come and go.

This service continuously monitors all the builds/tests running and
may kill some based on some criteria which includes memory usage.
However the memory usage is nondeterministic and killing a specific
build/test may not really free memory if most of the memory charged to
it is from shared build utilities.