On 12/13/2017 11:50 PM, Michal Hocko wrote: > On Wed 13-12-17 16:45:55, Mike Kravetz wrote: >> On 12/04/2017 06:01 AM, Michal Hocko wrote: >>> From: Michal Hocko <mhocko@xxxxxxxx> >>> >>> alloc_surplus_huge_page increases the pool size and the number of >>> surplus pages opportunistically to prevent from races with the pool size >>> change. See d1c3fb1f8f29 ("hugetlb: introduce nr_overcommit_hugepages >>> sysctl") for more details. >>> >>> The resulting code is unnecessarily hairy, cause code duplication and >>> doesn't allow to share the allocation paths. Moreover pool size changes >>> tend to be very seldom so optimizing for them is not really reasonable. >>> Simplify the code and allow to allocate a fresh surplus page as long as >>> we are under the overcommit limit and then recheck the condition after >>> the allocation and drop the new page if the situation has changed. This >>> should provide a reasonable guarantee that an abrupt allocation requests >>> will not go way off the limit. >>> >>> If we consider races with the pool shrinking and enlarging then we >>> should be reasonably safe as well. In the first case we are off by one >>> in the worst case and the second case should work OK because the page is >>> not yet visible. We can waste CPU cycles for the allocation but that >>> should be acceptable for a relatively rare condition. >>> >>> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> >>> --- >>> mm/hugetlb.c | 60 +++++++++++++++++++++--------------------------------------- >>> 1 file changed, 21 insertions(+), 39 deletions(-) >>> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index a1b8b2888ec9..0c7dc269b6c0 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -1538,62 +1538,44 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) >>> static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, >>> int nid, nodemask_t *nmask) >>> { >>> - struct page *page; >>> - unsigned int r_nid; >>> + struct page *page = NULL; >>> >>> if (hstate_is_gigantic(h)) >>> return NULL; >>> >>> - /* >>> - * Assume we will successfully allocate the surplus page to >>> - * prevent racing processes from causing the surplus to exceed >>> - * overcommit >>> - * >>> - * This however introduces a different race, where a process B >>> - * tries to grow the static hugepage pool while alloc_pages() is >>> - * called by process A. B will only examine the per-node >>> - * counters in determining if surplus huge pages can be >>> - * converted to normal huge pages in adjust_pool_surplus(). A >>> - * won't be able to increment the per-node counter, until the >>> - * lock is dropped by B, but B doesn't drop hugetlb_lock until >>> - * no more huge pages can be converted from surplus to normal >>> - * state (and doesn't try to convert again). Thus, we have a >>> - * case where a surplus huge page exists, the pool is grown, and >>> - * the surplus huge page still exists after, even though it >>> - * should just have been converted to a normal huge page. This >>> - * does not leak memory, though, as the hugepage will be freed >>> - * once it is out of use. It also does not allow the counters to >>> - * go out of whack in adjust_pool_surplus() as we don't modify >>> - * the node values until we've gotten the hugepage and only the >>> - * per-node value is checked there. >>> - */ >>> spin_lock(&hugetlb_lock); >>> - if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) { >>> - spin_unlock(&hugetlb_lock); >>> - return NULL; >>> - } else { >>> - h->nr_huge_pages++; >>> - h->surplus_huge_pages++; >>> - } >>> + if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) >>> + goto out_unlock; >>> spin_unlock(&hugetlb_lock); >>> >>> page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, nid, nmask); >>> + if (!page) >>> + goto out_unlock; >>> >>> spin_lock(&hugetlb_lock); >>> - if (page) { >>> + /* >>> + * We could have raced with the pool size change. >>> + * Double check that and simply deallocate the new page >>> + * if we would end up overcommiting the surpluses. Abuse >>> + * temporary page to workaround the nasty free_huge_page >>> + * codeflow >>> + */ >>> + if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) { >>> + SetPageHugeTemporary(page); >>> + put_page(page); >>> + page = NULL; >>> + } else { >>> + h->surplus_huge_pages_node[page_to_nid(page)]++; >>> + h->surplus_huge_pages++; >>> INIT_LIST_HEAD(&page->lru); >>> r_nid = page_to_nid(page); >>> set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); >>> set_hugetlb_cgroup(page, NULL); >>> - /* >>> - * We incremented the global counters already >>> - */ >>> h->nr_huge_pages_node[r_nid]++; >>> h->surplus_huge_pages_node[r_nid]++; >>> - } else { >>> - h->nr_huge_pages--; >>> - h->surplus_huge_pages--; >> >> In the case of a successful surplus allocation, the following counters >> are incremented: >> >> h->surplus_huge_pages_node[page_to_nid(page)]++; >> h->surplus_huge_pages++; >> h->nr_huge_pages_node[r_nid]++; >> h->surplus_huge_pages_node[r_nid]++; >> >> Looks like per-node surplus_huge_pages_node is incremented twice, and >> global nr_huge_pages is not incremented at all. >> >> Also, you removed r_nid so I'm guessing this will not compile? > > Ups a hickup during the rebase/split up. The following code removes all > this so I haven't noticed. Thanks for catching that! > The incremental diff > --- > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 41d2d9082f0d..3c16cde72ceb 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1565,8 +1565,10 @@ static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, > put_page(page); > page = NULL; > } else { > - h->surplus_huge_pages_node[page_to_nid(page)]++; > + int r_nid; > + > h->surplus_huge_pages++; > + h->nr_huge_pages++; > INIT_LIST_HEAD(&page->lru); > r_nid = page_to_nid(page); > set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); > With the incremental diff, Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> -- Mike Kravetz -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>