Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller

Michal Hocko <mhocko@xxxxxxxx> · Tue, 3 Oct 2023 14:58:58 +0200

On Mon 02-10-23 17:18:27, Nhat Pham wrote:
> Currently, hugetlb memory usage is not acounted for in the memory
> controller, which could lead to memory overprotection for cgroups with
> hugetlb-backed memory. This has been observed in our production system.
> 
> For instance, here is one of our usecases: suppose there are two 32G
> containers. The machine is booted with hugetlb_cma=6G, and each
> container may or may not use up to 3 gigantic page, depending on the
> workload within it. The rest is anon, cache, slab, etc. We can set the
> hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness.
> But it is very difficult to configure memory.max to keep overall
> consumption, including anon, cache, slab etc. fair.
> 
> What we have had to resort to is to constantly poll hugetlb usage and
> readjust memory.max. Similar procedure is done to other memory limits
> (memory.low for e.g). However, this is rather cumbersome and buggy.

Could you expand some more on how this _helps_ memory.low? The
hugetlb memory is not reclaimable so whatever portion of its memcg
consumption will be "protected from the reclaim". Consider this
	      parent
	/		\
       A		 B
       low=50%		 low=0
       current=40%	 current=60%

We have an external memory pressure and the reclaim should prefer B as A
is under its low limit, correct? But now consider that the predominant
consumption of B is hugetlb which would mean the memory reclaim cannot
do much for B and so the A's protection might be breached.

As an admin (or a tool) you need to know about hugetlb as a potential
contributor to this behavior (sure mlocked memory would behave the same
but mlock rarely consumes huge amount of memory in my experience).
Without the accounting there might not be any external pressure in the
first place. 

All that being said, I do not see how adding hugetlb into accounting
makes low, min limits management any easier.

-- 
Michal Hocko
SUSE Labs