Re: [PATCH 15/18] sched: Set preferred NUMA node based on number of private faults

Mel Gorman <mgorman@xxxxxxx> · Wed, 31 Jul 2013 11:10:41 +0100

On Wed, Jul 31, 2013 at 11:34:37AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 31, 2013 at 10:29:38AM +0100, Mel Gorman wrote:
> > > Hurmph I just stumbled upon this PMD 'trick' and I'm not at all sure I
> > > like it. If an application would pre-fault/initialize its memory with
> > > the main thread we'll collapse it into a PMDs and forever thereafter (by
> > > virtue of do_pmd_numa_page()) they'll all stay the same. Resulting in
> > > PMD granularity.
> > > 
> > 
> > Potentially yes. When that PMD trick was introduced it was because the cost
> > of faults was very high due to a high scanning rate. The trick mitigated
> > worse-case scenarios until faults were properly accounted for and the scan
> > rates were better controlled. As these *should* be addressed by the series
> > I think I will be adding a patch to kick away this PMD crutch and see how
> > it looks in profiles.
> 
> I've been thinking on this a bit and I think we should split these and
> thp pages when we get shared faults from different nodes on them and
> refuse thp collapses when the pages are on different nodes.
> 

Agreed, I reached the same conclusion when thinking about THP false sharing
just before I went on holiday. The first prototype patch was a bit messy
and performed very badly so "Handle false sharing of THP" was chucked onto
the TODO pile to worry about when I got back. It also collided a little with
the PMD handling of base pages which is another reason to get rid of that.

> With the exception that when we introduce the interleave mempolicies we
> should define 'different node' as being outside of the interleave mask.

Understood.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>