On 8/8/19 4:13 PM, Mina Almasry wrote: > For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives > in the resv_map entries, in file_region->reservation_counter. > > When a file_region entry is added to the resv_map via region_add, we > also charge the appropriate hugetlb_cgroup and put the pointer to that > in file_region->reservation_counter. This is slightly delicate since we > need to not modify the resv_map until we know that charging the > reservation has succeeded. If charging doesn't succeed, we report the > error to the caller, so that the kernel fails the reservation. I wish we did not need to modify these region_() routines as they are already difficult to understand. However, I see no other way with the desired semantics. > On region_del, which is when the hugetlb memory is unreserved, we delete > the file_region entry in the resv_map, but also uncharge the > file_region->reservation_counter. > > --- > mm/hugetlb.c | 208 +++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 170 insertions(+), 38 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 235996aef6618..d76e3137110ab 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -242,8 +242,72 @@ struct file_region { > struct list_head link; > long from; > long to; > +#ifdef CONFIG_CGROUP_HUGETLB > + /* > + * On shared mappings, each reserved region appears as a struct > + * file_region in resv_map. These fields hold the info needed to > + * uncharge each reservation. > + */ > + struct page_counter *reservation_counter; > + unsigned long pages_per_hpage; > +#endif > }; > > +/* Must be called with resv->lock held. Calling this with dry_run == true will > + * count the number of pages added but will not modify the linked list. > + */ > +static long consume_regions_we_overlap_with(struct file_region *rg, > + struct list_head *head, long f, long *t, > + struct hugetlb_cgroup *h_cg, > + struct hstate *h, > + bool dry_run) > +{ > + long add = 0; > + struct file_region *trg = NULL, *nrg = NULL; > + > + /* Consume any regions we now overlap with. */ > + nrg = rg; > + list_for_each_entry_safe(rg, trg, rg->link.prev, link) { > + if (&rg->link == head) > + break; > + if (rg->from > *t) > + break; > + > + /* If this area reaches higher then extend our area to > + * include it completely. If this is not the first area > + * which we intend to reuse, free it. > + */ > + if (rg->to > *t) > + *t = rg->to; > + if (rg != nrg) { > + /* Decrement return value by the deleted range. > + * Another range will span this area so that by > + * end of routine add will be >= zero > + */ > + add -= (rg->to - rg->from); > + if (!dry_run) { > + list_del(&rg->link); > + kfree(rg); Is it possible that the region struct we are deleting pointed to a reservation_counter? Perhaps even for another cgroup? Just concerned with the way regions are coalesced that we may be deleting counters. -- Mike Kravetz