Re: [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page to workaround the nasty free_huge_page

Michal Hocko <mhocko@xxxxxxxx> · Thu, 13 Aug 2020 13:46:38 +0200

On Tue 11-08-20 14:43:28, Mike Kravetz wrote:
> On 8/10/20 11:54 PM, Michal Hocko wrote:
> > 
> > I have managed to forgot all the juicy details since I have made that
> > change. All that remains is that the surplus pages accounting was quite
> > tricky and back then I didn't figure out a simpler method that would
> > achieve the consistent look at those counters. As mentioned above I
> > suspect this could lead to pre-mature allocation failures while the
> > migration is ongoing.
> 
> It is likely lost in the e-mail thread, but the suggested change was to
> alloc_surplus_huge_page().  The code which allocates the migration target
> (alloc_migrate_huge_page) will not be changed.  So, this should not be
> an issue.

OK, I've missed that obviously.

> >                       Sure quite unlikely to happen and the race window
> > is likely very small. Maybe this is even acceptable but I would strongly
> > recommend to have all this thinking documented in the changelog.
> 
> I wrote down a description of what happens in the two different approaches
> "temporary page" vs "surplus page".  It is at the very end of this e-mail.
> When looking at the details, I came up with what may be an even better
> approach.  Why not just call the low level routine to free the page instead
> of going through put_page/free_huge_page?  At the very least, it saves a
> lock roundtrip and there is no need to worry about the counters/accounting.
> 
> Here is a patch to do that.  However, we are optimizing a return path in
> a race condition that we are unlikely to ever hit.  I 'tested' it by allocating
> an 'extra' page and freeing it via this method in alloc_surplus_huge_page.
> 
> >From 864c5f8ef4900c95ca3f6f2363a85f3cb25e793e Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Date: Tue, 11 Aug 2020 12:45:41 -0700
> Subject: [PATCH] hugetlb: optimize race error return in
>  alloc_surplus_huge_page
> 
> The routine alloc_surplus_huge_page() could race with with a pool
> size change.  If this happens, the allocated page may not be needed.
> To free the page, the current code will 'Abuse temporary page to
> workaround the nasty free_huge_page codeflow'.  Instead, directly
> call the low level routine that free_huge_page uses.  This works
> out well because the page is new, we hold the only reference and
> already hold the hugetlb_lock.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> ---
>  mm/hugetlb.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 590111ea6975..ac89b91fba86 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1923,14 +1923,17 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
>  	/*
>  	 * We could have raced with the pool size change.
>  	 * Double check that and simply deallocate the new page
> -	 * if we would end up overcommiting the surpluses. Abuse
> -	 * temporary page to workaround the nasty free_huge_page
> -	 * codeflow
> +	 * if we would end up overcommiting the surpluses.
>  	 */
>  	if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
> -		SetPageHugeTemporary(page);
> +		/*
> +		 * Since this page is new, we hold the only reference, and
> +		 * we already hold the hugetlb_lock call the low level free
> +		 * page routine.  This saves at least a lock roundtrip.
> +		 */
> +		(void)put_page_testzero(page); /* don't call destructor */
> +		update_and_free_page(h, page);
>  		spin_unlock(&hugetlb_lock);
> -		put_page(page);
>  		return NULL;
>  	} else {
>  		h->surplus_huge_pages++;

Yes this makes sense. I would have to think about this more to be
confident and give Acked-by but this looks sensible from a quick glance.

Thanks!
-- 
Michal Hocko
SUSE Labs