On Wed, Aug 14, 2019 at 9:46 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > On 8/13/19 4:54 PM, Mike Kravetz wrote: > > On 8/8/19 4:13 PM, Mina Almasry wrote: > >> For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives > >> in the resv_map entries, in file_region->reservation_counter. > >> > >> When a file_region entry is added to the resv_map via region_add, we > >> also charge the appropriate hugetlb_cgroup and put the pointer to that > >> in file_region->reservation_counter. This is slightly delicate since we > >> need to not modify the resv_map until we know that charging the > >> reservation has succeeded. If charging doesn't succeed, we report the > >> error to the caller, so that the kernel fails the reservation. > > > > I wish we did not need to modify these region_() routines as they are > > already difficult to understand. However, I see no other way with the > > desired semantics. > > > > I suspect you have considered this, but what about using the return value > from region_chg() in hugetlb_reserve_pages() to charge reservation limits? > There is a VERY SMALL race where the value could be too large, but that > can be checked and adjusted at region_add time as is done with normal > accounting today. I have not actually until now; I didn't consider doing stuff with the resv_map while not holding onto the resv_map->lock. I guess that's the small race you're talking about. Seems fine to me, but I'm more worried about hanging off the vma below. > If the question is, where would we store the information > to uncharge?, then we can hang a structure off the vma. This would be > similar to what is done for private mappings. In fact, I would suggest > making them both use a new cgroup reserve structure hanging off the vma. > I actually did consider hanging off the info to uncharge off the vma, but I didn't for a couple of reasons: 1. region_del is called from hugetlb_unreserve_pages, and I don't have access to the vma there. Maybe there is a way to query the proper vma I don't know about? 2. hugetlb_reserve_pages seems to be able to conduct a reservation with a NULL *vma. Not sure what to do in that case. Is there a way to get around these that I'm missing here? FWIW I think tracking is better in resv_map since the reservations are in resv_map themselves. If I do another structure, then for each reservation there will be an entry in resv_map and an entry in the new structure and they need to be kept in sync and I need to handle errors for when they get out of sync. > One issue I see is what to do if a vma is split? The private mapping case > 'should' handle this today, but I would not be surprised if such code is > missing or incorrect. > > -- > Mike Kravetz