Re: [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page to workaround the nasty free_huge_page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/11/20 at 08:54am, Michal Hocko wrote:
> On Tue 11-08-20 09:51:48, Baoquan He wrote:
> > On 08/10/20 at 05:19pm, Mike Kravetz wrote:
> > > On 8/9/20 7:17 PM, Baoquan He wrote:
> > > > On 08/07/20 at 05:12pm, Wei Yang wrote:
> > > >> Let's always increase surplus_huge_pages and so that free_huge_page
> > > >> could decrease it at free time.
> > > >>
> > > >> Signed-off-by: Wei Yang <richard.weiyang@xxxxxxxxxxxxxxxxx>
> > > >> ---
> > > >>  mm/hugetlb.c | 14 ++++++--------
> > > >>  1 file changed, 6 insertions(+), 8 deletions(-)
> > > >>
> > > >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > >> index 1f2010c9dd8d..a0eb81e0e4c5 100644
> > > >> --- a/mm/hugetlb.c
> > > >> +++ b/mm/hugetlb.c
> > > >> @@ -1913,21 +1913,19 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
> > > >>  		return NULL;
> > > >>  
> > > >>  	spin_lock(&hugetlb_lock);
> > > >> +
> > > >> +	h->surplus_huge_pages++;
> > > >> +	h->surplus_huge_pages_node[page_to_nid(page)]++;
> > > >> +
> > > >>  	/*
> > > >>  	 * We could have raced with the pool size change.
> > > >>  	 * Double check that and simply deallocate the new page
> > > >> -	 * if we would end up overcommiting the surpluses. Abuse
> > > >> -	 * temporary page to workaround the nasty free_huge_page
> > > >> -	 * codeflow
> > > >> +	 * if we would end up overcommiting the surpluses.
> > > >>  	 */
> > > >> -	if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
> > > >> -		SetPageHugeTemporary(page);
> > > > 
> > > > Hmm, the temporary page way is taken intentionally in
> > > > commit 9980d744a0428 ("mm, hugetlb: get rid of surplus page accounting tricks").
> > > > From code, this is done inside hugetlb_lock holding, and the code flow
> > > > is straightforward, should be safe. Adding Michal to CC.
> 
> But the lock is not held during the migration, right?

I see what I misunderstoold about the hugetlb_lock holding. The
put_page() is called after releasing hugetlb_lock in
alloc_surplus_huge_page(), I mistakenly got put_page() is inside
hugetlb_lock. Yes, there's obviously a race window, and the temporary
page way is an effective way to not mess up the surplus_huge_pages
accounting.

> 
> > > I remember when the temporary page code was added for page migration.
> > > The use of temporary page here was added at about the same time.  Temporary
> > > page does have one advantage in that it will not CAUSE surplus count to
> > > exceed overcommit.  This patch could cause surplus to exceed overcommit
> > > for a very short period of time.  However, do note that for this to happen
> > > the code needs to race with a pool resize which itself could cause surplus
> > > to exceed overcommit.
> 
> Correct.
> 
> > > IMO both approaches are valid.
> > > - Advantage of temporary page is that it can not cause surplus to exceed
> > >   overcommit.  Disadvantage is as mentioned in the comment 'abuse of temporary
> > >   page'.
> > > - Advantage of this patch is that it uses existing counters.  Disadvantage
> > >   is that it can momentarily cause surplus to exceed overcommit.
> 
> Do I remember correctly that this can cause an allocation failure due to
> overcommit check? In other words it would be user space visible thing?
> 
> > Yeah, since it's all done inside hugetlb_lock, should be OK even
> > though it may cause surplus to exceed overcommit.
> > > 
> > > Unless someone has a strong opinion, I prefer the changes in this patch.
> > 
> > Agree, I also prefer the code change in this patch, to remove the
> > unnecessary confusion about the temporary page.
> 
> I have managed to forgot all the juicy details since I have made that
> change. All that remains is that the surplus pages accounting was quite
> tricky and back then I didn't figure out a simpler method that would
> achieve the consistent look at those counters. As mentioned above I
> suspect this could lead to pre-mature allocation failures while the
> migration is ongoing. Sure quite unlikely to happen and the race window
> is likely very small. Maybe this is even acceptable but I would strongly
> recommend to have all this thinking documented in the changelog.
> -- 
> Michal Hocko
> SUSE Labs
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux