Re: [PATCH] mm,thp,shmem: limit shmem THP alloc gfp_mask

Rik van Riel <riel@xxxxxxxxxxx> · Thu, 22 Oct 2020 09:25:21 -0400

On Thu, 2020-10-22 at 10:15 +0200, Michal Hocko wrote:
> On Wed 21-10-20 23:48:46, Rik van Riel wrote:
> > The allocation flags of anonymous transparent huge pages can be
> > controlled
> > through the files in /sys/kernel/mm/transparent_hugepage/defrag,
> > which can
> > help the system from getting bogged down in the page reclaim and
> > compaction
> > code when many THPs are getting allocated simultaneously.
> > 
> > However, the gfp_mask for shmem THP allocations were not limited by
> > those
> > configuration settings, and some workloads ended up with all CPUs
> > stuck
> > on the LRU lock in the page reclaim code, trying to allocate dozens
> > of
> > THPs simultaneously.
> > 
> > This patch applies the same configurated limitation of THPs to
> > shmem
> > hugepage allocations, to prevent that from happening.
> > 
> > This way a THP defrag setting of "never" or "defer+madvise" will
> > result
> > in quick allocation failures without direct reclaim when no 2MB
> > free
> > pages are available.
> 
> I remmeber I wanted to unify this in the past as well. The patch got
> reverted in the end. It was a long story and I do not have time to
> read
> through lengthy discussions again. I do remember though that there
> were
> some objections pointing out that shmem has its own behavior which is
> controlled by the mount option - at least for the explicitly mounted
> shmems. I might misremember.

That is not entirely true, though.

THP has two main sysfs knobs: "enabled" and "defrag"

The mount options
control the shmem equivalent options
for "enabled", but they do not do anything for the "defrag"
equivalent options.

This patch applies the "defrag" THP options to
shmem.

> [...]
> 
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 537c137698f8..d1290eb508e5 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1545,8 +1545,11 @@ static struct page
> > *shmem_alloc_hugepage(gfp_t gfp,
> >  		return NULL;
> >  
> >  	shmem_pseudo_vma_init(&pvma, info, hindex);
> > -	page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY |
> > __GFP_NOWARN,
> > -			HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(),
> > true);
> > +	/* Limit the gfp mask according to THP configuration. */
> > +	gfp |= __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN;
> 
> What is the reason for these when alloc_hugepage_direct_gfpmask
> provides
> the full mask?

The mapping_gfp_mask for the shmem file might have additional
restrictions above and beyond the gfp mask returned by
alloc_hugepage_direct_gfpmask, and I am not sure we should just
ignore the mapping_gfp_mask.

That is why this patch takes the union of both gfp masks.

However, digging into things more, it looks like shmem inodes
always have the mapping gfp mask set to GFP_HIGHUSER_MOVABLE,
and it is never changed, so simply using the output from
alloc_hugepage_direct_gfpmask should be fine.

I can do the patch either way. Just let me know what you prefer.

> > +	gfp &= alloc_hugepage_direct_gfpmask(&pvma);
> > +	page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, &pvma, 0,
> > numa_node_id(),
> > +			       true);
> >  	shmem_pseudo_vma_destroy(&pvma);
> >  	if (page)
> >  		prep_transhuge_page(page);
> > 
> > -- 
> > All rights reversed.
-- 
All Rights Reversed.
Attachment:
signature.asc

Description: This is a digitally signed message part