On Tue, Aug 13, 2019 at 4:54 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > On 8/8/19 4:13 PM, Mina Almasry wrote: > > For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives > > in the resv_map entries, in file_region->reservation_counter. > > > > When a file_region entry is added to the resv_map via region_add, we > > also charge the appropriate hugetlb_cgroup and put the pointer to that > > in file_region->reservation_counter. This is slightly delicate since we > > need to not modify the resv_map until we know that charging the > > reservation has succeeded. If charging doesn't succeed, we report the > > error to the caller, so that the kernel fails the reservation. > > I wish we did not need to modify these region_() routines as they are > already difficult to understand. However, I see no other way with the > desired semantics. > > > On region_del, which is when the hugetlb memory is unreserved, we delete > > the file_region entry in the resv_map, but also uncharge the > > file_region->reservation_counter. > > > > --- > > mm/hugetlb.c | 208 +++++++++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 170 insertions(+), 38 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 235996aef6618..d76e3137110ab 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -242,8 +242,72 @@ struct file_region { > > struct list_head link; > > long from; > > long to; > > +#ifdef CONFIG_CGROUP_HUGETLB > > + /* > > + * On shared mappings, each reserved region appears as a struct > > + * file_region in resv_map. These fields hold the info needed to > > + * uncharge each reservation. > > + */ > > + struct page_counter *reservation_counter; > > + unsigned long pages_per_hpage; > > +#endif > > }; > > > > +/* Must be called with resv->lock held. Calling this with dry_run == true will > > + * count the number of pages added but will not modify the linked list. > > + */ > > +static long consume_regions_we_overlap_with(struct file_region *rg, > > + struct list_head *head, long f, long *t, > > + struct hugetlb_cgroup *h_cg, > > + struct hstate *h, > > + bool dry_run) > > +{ > > + long add = 0; > > + struct file_region *trg = NULL, *nrg = NULL; > > + > > + /* Consume any regions we now overlap with. */ > > + nrg = rg; > > + list_for_each_entry_safe(rg, trg, rg->link.prev, link) { > > + if (&rg->link == head) > > + break; > > + if (rg->from > *t) > > + break; > > + > > + /* If this area reaches higher then extend our area to > > + * include it completely. If this is not the first area > > + * which we intend to reuse, free it. > > + */ > > + if (rg->to > *t) > > + *t = rg->to; > > + if (rg != nrg) { > > + /* Decrement return value by the deleted range. > > + * Another range will span this area so that by > > + * end of routine add will be >= zero > > + */ > > + add -= (rg->to - rg->from); > > + if (!dry_run) { > > + list_del(&rg->link); > > + kfree(rg); > > Is it possible that the region struct we are deleting pointed to > a reservation_counter? Perhaps even for another cgroup? > Just concerned with the way regions are coalesced that we may be > deleting counters. > Yep, that needs to be handled I think. Thanks for catching! > -- > Mike Kravetz