On 2021/2/9 3:52, Mike Kravetz wrote: > On 1/23/21 1:31 AM, Miaohe Lin wrote: >> The current implementation of hugetlb_cgroup for shared mappings could have >> different behavior. Consider the following two scenarios: >> >> 1.Assume initial css reference count of hugetlb_cgroup is 1: >> 1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference >> count is 2 associated with 1 file_region. >> 1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference >> count is 3 associated with 2 file_region. >> 1.3 coalesce_file_region will coalesce these two file_regions into one. >> So css reference count is 3 associated with 1 file_region now. >> >> 2.Assume initial css reference count of hugetlb_cgroup is 1 again: >> 2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference >> count is 2 associated with 1 file_region. >> >> Therefore, we might have one file_region while holding one or more css >> reference counts. This inconsistency could lead to unbalanced css_put(). >> If we do css_put one by one (i.g. hole punch case), scenario 2 would put >> one more css reference. If we do css_put all together (i.g. truncate case), >> scenario 1 will leak one css reference. > > Sorry for the delay in replying. This is tricky code and I needed some quiet > time to study it. > That's fine. I was trying to catch more buggy case too. > I agree that the issue described exists. Can you describe what a user would > see in the above imbalance scenarios? What happens if we do one too many > css_put calls? What happens if we leak the reference and do not do the > required number of css_puts? > The imbalanced css_get/css_put would result in a non-zero reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup dir is removed __but__ associated resource is not freed. This might result in OOM or can not create a new hugetlb cgroup in a really busy workload finally. > The code changes look correct. > > I just wish this code was not so complicated. I think the private mapping > case could be simplified to only take a single css_ref per reserve map. Could you explain this more? It seems one reserve map already takes a single css_ref. And a hugepage outside reservation would take a single css_ref too. > However, for shared mappings we need to track each individual reservation > which adds the complexity. I can not think of a better way to do things. > I can't figure out one too. And the fix might make the code more complex. :( > Please update commit message with an explanation of what users might see > because of this issue and resubmit as a patch. > Will do. Thanks. > Thanks, > Many thanks for reply. :)