Re: [RFC PATCH 05/26] mm: page_alloc: per-migratetype pcplist for THPs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 21, 2023 at 11:06:48AM -0400, Johannes Weiner wrote:
> On Fri, Apr 21, 2023 at 01:47:44PM +0100, Mel Gorman wrote:
> > On Tue, Apr 18, 2023 at 03:12:52PM -0400, Johannes Weiner wrote:
> > > Right now, there is only one pcplist for THP allocations. However,
> > > while most THPs are movable, the huge zero page is not. This means a
> > > movable THP allocation can grab an unmovable block from the pcplist,
> > > and a subsequent THP split, partial free, and reallocation of the
> > > remainder will mix movable and unmovable pages in the block.
> > > 
> > > While this isn't a huge source of block pollution in practice, it
> > > happens often enough to trigger debug warnings fairly quickly under
> > > load. In the interest of tightening up pageblock hygiene, make the THP
> > > pcplists fully migratetype-aware, just like the lower order ones.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> > 
> > Split out :P
> > 
> > Take special care of this one because, while I didn't check this, I
> > suspect it'll push the PCP structure size into the next cache line and
> > increase overhead.
> >
> > The changelog makes it unclear why exactly this happens or why the
> > patch fixes it.
> 
> Before this, I'd see warnings from the last patch in the series about
> received migratetype not matching requested mt.
> 
> The way it happens is that the zero page gets freed and the unmovable
> block put on the pcplist. A regular THP allocation is subsequently
> served from an unmovable block.
> 
> Mental note, I think this can happen the other way around too: a
> regular THP on the pcp being served to a MIGRATE_UNMOVABLE zero
> THP. It's not supposed to, but it looks like there is a bug in the
> code that's meant to prevent that from happening in rmqueue():
> 
> 	if (likely(pcp_allowed_order(order))) {
> 		/*
> 		 * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> 		 * we need to skip it when CMA area isn't allowed.
> 		 */
> 		if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> 				migratetype != MIGRATE_MOVABLE) {
> 			page = rmqueue_pcplist(preferred_zone, zone, order,
> 					migratetype, alloc_flags);
> 			if (likely(page))
> 				goto out;
> 		}
> 	}
> 
> Surely that last condition should be migratetype == MIGRATE_MOVABLE?
> 

It should be. It would have been missed for ages because it would need a
test case based on a machine configuration that requires CMA for functional
correctness and is using THP which is an unlikely combination.

> > The huge zero page strips GFP_MOVABLE (so unmovable)
> > but at allocation time, it doesn't really matter what the movable type
> > is because it's a full pageblock. It doesn't appear to be a hazard until
> > the split happens. Assuming that's the case, it should be ok to always
> > set the pageblock movable for THP allocations regardless of GFP flags at
> > allocation time or else set the pageblock MOVABLE at THP split (always
> > MOVABLE at allocation time makes more sense).
> 
> The regular allocator compaction skips over compound pages anyway, so
> the migratetype should indeed not matter there.
> 
> The bigger issue is CMA. alloc_contig_range() will try to move THPs to
> free a larger range. We have to be careful not to place an unmovable
> zero THP into a CMA region. That means we can not play games with MT -
> we really do have to physically keep unmovable and movable THPs apart.
> 

Fair point.

> Another option would be not to use pcp for the zero THP. It's cached
> anyway in the caller. But it would add branches to the THP alloc and
> free fast paths (pcp_allowed_order() also checking migratetype).

And this is probably the most straight-forward option. The intent behind
caching some THPs on PCP was faulting large mappings of normal THPs and
reducing the contention on the zone lock a little. The zero THP is somewhat
special because it should not be allocated at high frequency.

-- 
Mel Gorman
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux