On Sat, Nov 10, 2018 at 03:48:14AM +0000, Elliott, Robert (Persistent Memory) wrote: > > -----Original Message----- > > From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel- > > owner@xxxxxxxxxxxxxxx> On Behalf Of Daniel Jordan > > Sent: Monday, November 05, 2018 10:56 AM > > Subject: [RFC PATCH v4 11/13] mm: parallelize deferred struct page > > initialization within each node > > > > ... The kernel doesn't > > know the memory bandwidth of a given system to get the most efficient > > number of threads, so there's some guesswork involved. > > The ACPI HMAT (Heterogeneous Memory Attribute Table) is designed to report > that kind of information, and could facilitate automatic tuning. > > There was discussion last year about kernel support for it: > https://lore.kernel.org/lkml/20171214021019.13579-1-ross.zwisler@xxxxxxxxxxxxxxx/ Thanks for bringing this up. I'm traveling but will take a closer look when I get back. > > In testing, a reasonable value turned out to be about a quarter of the > > CPUs on the node. > ... > > + /* > > + * We'd like to know the memory bandwidth of the chip to > > calculate the > > + * most efficient number of threads to start, but we can't. > > + * In testing, a good value for a variety of systems was a > > quarter of the CPUs on the node. > > + */ > > + nr_node_cpus = DIV_ROUND_UP(cpumask_weight(cpumask), 4); > > > You might want to base that calculation on and limit the threads to > physical cores, not hyperthreaded cores. Why? Hyperthreads can be beneficial when waiting on memory. That said, I don't have data that shows that in this case.