On 8/13/19 4:54 PM, Mike Kravetz wrote: > On 8/8/19 4:13 PM, Mina Almasry wrote: >> For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives >> in the resv_map entries, in file_region->reservation_counter. >> >> When a file_region entry is added to the resv_map via region_add, we >> also charge the appropriate hugetlb_cgroup and put the pointer to that >> in file_region->reservation_counter. This is slightly delicate since we >> need to not modify the resv_map until we know that charging the >> reservation has succeeded. If charging doesn't succeed, we report the >> error to the caller, so that the kernel fails the reservation. > > I wish we did not need to modify these region_() routines as they are > already difficult to understand. However, I see no other way with the > desired semantics. > I suspect you have considered this, but what about using the return value from region_chg() in hugetlb_reserve_pages() to charge reservation limits? There is a VERY SMALL race where the value could be too large, but that can be checked and adjusted at region_add time as is done with normal accounting today. If the question is, where would we store the information to uncharge?, then we can hang a structure off the vma. This would be similar to what is done for private mappings. In fact, I would suggest making them both use a new cgroup reserve structure hanging off the vma. One issue I see is what to do if a vma is split? The private mapping case 'should' handle this today, but I would not be surprised if such code is missing or incorrect. -- Mike Kravetz