Re: [RFC PATCH v4 11/13] mm: parallelize deferred struct page initialization within each node

Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> · Mon, 12 Nov 2018 08:54:12 -0800

On Sat, Nov 10, 2018 at 03:48:14AM +0000, Elliott, Robert (Persistent Memory) wrote:
> > -----Original Message-----
> > From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-
> > owner@xxxxxxxxxxxxxxx> On Behalf Of Daniel Jordan
> > Sent: Monday, November 05, 2018 10:56 AM
> > Subject: [RFC PATCH v4 11/13] mm: parallelize deferred struct page
> > initialization within each node
> > 
> > ...  The kernel doesn't
> > know the memory bandwidth of a given system to get the most efficient
> > number of threads, so there's some guesswork involved.  
> 
> The ACPI HMAT (Heterogeneous Memory Attribute Table) is designed to report
> that kind of information, and could facilitate automatic tuning.
> 
> There was discussion last year about kernel support for it:
> https://lore.kernel.org/lkml/20171214021019.13579-1-ross.zwisler@xxxxxxxxxxxxxxx/

Thanks for bringing this up.  I'm traveling but will take a closer look when I
get back.

> > In testing, a reasonable value turned out to be about a quarter of the
> > CPUs on the node.
> ...
> > +	/*
> > +	 * We'd like to know the memory bandwidth of the chip to
> >         calculate the
> > +	 * most efficient number of threads to start, but we can't.
> > +	 * In testing, a good value for a variety of systems was a
> >         quarter of the CPUs on the node.
> > +	 */
> > +	nr_node_cpus = DIV_ROUND_UP(cpumask_weight(cpumask), 4);
> 
> 
> You might want to base that calculation on and limit the threads to
> physical cores, not hyperthreaded cores.

Why?  Hyperthreads can be beneficial when waiting on memory.  That said, I
don't have data that shows that in this case.