On 2/8/21 7:27 PM, Miaohe Lin wrote: > On 2021/2/9 3:52, Mike Kravetz wrote: >> On 1/23/21 1:31 AM, Miaohe Lin wrote: >>> The current implementation of hugetlb_cgroup for shared mappings could have >>> different behavior. Consider the following two scenarios: >>> >>> 1.Assume initial css reference count of hugetlb_cgroup is 1: >>> 1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference >>> count is 2 associated with 1 file_region. >>> 1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference >>> count is 3 associated with 2 file_region. >>> 1.3 coalesce_file_region will coalesce these two file_regions into one. >>> So css reference count is 3 associated with 1 file_region now. >>> >>> 2.Assume initial css reference count of hugetlb_cgroup is 1 again: >>> 2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference >>> count is 2 associated with 1 file_region. >>> >>> Therefore, we might have one file_region while holding one or more css >>> reference counts. This inconsistency could lead to unbalanced css_put(). >>> If we do css_put one by one (i.g. hole punch case), scenario 2 would put >>> one more css reference. If we do css_put all together (i.g. truncate case), >>> scenario 1 will leak one css reference. >> >> Sorry for the delay in replying. This is tricky code and I needed some quiet >> time to study it. >> > > That's fine. I was trying to catch more buggy case too. > >> I agree that the issue described exists. Can you describe what a user would >> see in the above imbalance scenarios? What happens if we do one too many >> css_put calls? What happens if we leak the reference and do not do the >> required number of css_puts? >> > > The imbalanced css_get/css_put would result in a non-zero reference when we try to > destroy the hugetlb cgroup. The hugetlb cgroup dir is removed __but__ associated > resource is not freed. This might result in OOM or can not create a new hugetlb cgroup > in a really busy workload finally. > >> The code changes look correct. >> >> I just wish this code was not so complicated. I think the private mapping >> case could be simplified to only take a single css_ref per reserve map. > > Could you explain this more? > It seems one reserve map already takes a single css_ref. And a hugepage outside > reservation would take a single css_ref too. Let me preface this by saying that my cgroup knowledge is limited. For private mappings, all reservations will be associated with the same cgroup. This is because, only one process can access the mapping. Since there is only one process, we only need to hold one css reference. Individual counters can be incremented as needed without increasing the css reference count. We take a reference when the reserv map is created and drop the reference when it is deleted. This does not work for shared mappings as you can have multiple processes in multiple cgroups taking reservations on the same file. This is why you need per-reservation reference accounting in this case. -- Mike Kravetz > >> However, for shared mappings we need to track each individual reservation >> which adds the complexity. I can not think of a better way to do things. >> > > I can't figure out one too. And the fix might make the code more complex. :( > >> Please update commit message with an explanation of what users might see >> because of this issue and resubmit as a patch. >> > > Will do. Thanks. > >> Thanks, >> > > Many thanks for reply. :) >