On Mon 02-10-23 17:18:27, Nhat Pham wrote: > Currently, hugetlb memory usage is not acounted for in the memory > controller, which could lead to memory overprotection for cgroups with > hugetlb-backed memory. This has been observed in our production system. > > For instance, here is one of our usecases: suppose there are two 32G > containers. The machine is booted with hugetlb_cma=6G, and each > container may or may not use up to 3 gigantic page, depending on the > workload within it. The rest is anon, cache, slab, etc. We can set the > hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness. > But it is very difficult to configure memory.max to keep overall > consumption, including anon, cache, slab etc. fair. > > What we have had to resort to is to constantly poll hugetlb usage and > readjust memory.max. Similar procedure is done to other memory limits > (memory.low for e.g). However, this is rather cumbersome and buggy. Could you expand some more on how this _helps_ memory.low? The hugetlb memory is not reclaimable so whatever portion of its memcg consumption will be "protected from the reclaim". Consider this parent / \ A B low=50% low=0 current=40% current=60% We have an external memory pressure and the reclaim should prefer B as A is under its low limit, correct? But now consider that the predominant consumption of B is hugetlb which would mean the memory reclaim cannot do much for B and so the A's protection might be breached. As an admin (or a tool) you need to know about hugetlb as a potential contributor to this behavior (sure mlocked memory would behave the same but mlock rarely consumes huge amount of memory in my experience). Without the accounting there might not be any external pressure in the first place. All that being said, I do not see how adding hugetlb into accounting makes low, min limits management any easier. -- Michal Hocko SUSE Labs