Nitin Gupta writes: > > On 1/4/16 1:13 PM, Alexandre Chartre wrote: > > > > A Sun Blade 2500 is sun4u so there's no MD; the MD is only available on sun4v. > > > > alex. > > > I see. I'm currently initializing numa node distance matrix only in > case where MD exists which is wrong. > > Mikael: Can you please try patch below which moves initialization > earlier so the initialization happens for both sun4u and sun4v? > > Thanks, > Nitin Thanks, this fixed the problem. I'm currently doing a GCC 6 bootstrap and regtest on 4.4-rc8 + this patch, and things look good again. Tested-by: Mikael Pettersson <mikpelinux@xxxxxxxxx> > > > > diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c > index 3025bd5..ff63db5 100644 > --- a/arch/sparc/mm/init_64.c > +++ b/arch/sparc/mm/init_64.c > @@ -1267,13 +1267,6 @@ static int __init numa_parse_mdesc(void) > int i, j, err, count; > u64 node; > > - /* Some sane defaults for numa latency values */ > - for (i = 0; i < MAX_NUMNODES; i++) { > - for (j = 0; j < MAX_NUMNODES; j++) > - numa_latency[i][j] = (i == j) ? > - LOCAL_DISTANCE : REMOTE_DISTANCE; > - } > - > node = mdesc_node_by_name(md, MDESC_NODE_NULL, "latency-groups"); > if (node == MDESC_NODE_NULL) { > mdesc_release(md); > @@ -1374,6 +1367,14 @@ static int __init bootmem_init_numa(void) > numadbg("bootmem_init_numa()\n"); > > if (numa_enabled) { > + int i, j; > + /* Some sane defaults for numa latency values */ > + for (i = 0; i < MAX_NUMNODES; i++) { > + for (j = 0; j < MAX_NUMNODES; j++) > + numa_latency[i][j] = (i == j) ? > + LOCAL_DISTANCE : REMOTE_DISTANCE; > + } > + > if (tlb_type == hypervisor) > err = numa_parse_mdesc(); > else > > > > > >> On Jan 4, 2016, at 06:57, Nitin Gupta <nitin.m.gupta@xxxxxxxxxx> wrote: > >> > >> Mike, > >> > >> I believe this is due to the firmware exporting wrong/incomplete > >> information about memory latency groups in the machine descriptor (MD). > >> Before this patch, this information was not used at all and kernel > >> always used default values for numa node distance values. With incorrect > >> values, scheduler can have a skewed view of the machine causing this > >> non optimal usage. My testing on T7, T5, T4 with recent firmwares never > >> showed such issues. > >> > >> Can you please provide output of 'numactl --hardware' on your machine? > >> Ideally, I would also require dump of the MD but I don't have a script > >> handy for this which I can share externally. > >> > >> Dave: would you have a script to dump MD which you can share? > >> > >> Thanks, > >> Nitin > >> > >>>> > >>>> From: Mikael Pettersson <mikpelinux@xxxxxxxxx> > >>>> Subject: [BISECTED] "sparc64: Fix numa distance values" breakage (was: 4.4-rc kernels only use one of two CPUs on Sun Blade 2500) > >>>> Date: December 30, 2015 at 9:18:57 AM MST > >>>> To: Mikael Pettersson <mikpelinux@xxxxxxxxx> > >>>> Cc: Linux SPARC Kernel Mailing List <sparclinux@xxxxxxxxxxxxxxx> > >>>> > >>>> Mikael Pettersson writes: > >>>>> Something is causing the 4.4-rc kernels to only use half the CPU > >>>>> capacity of my Sun Blade 2500 (dual USIIIi). The kernel does detect > >>>>> both CPUs, but it doesn't seem to want to schedule processes on > >>>>> both of them. During CPU-intensive jobs like GCC bootstraps, 'top' > >>>>> indicates the machine is 50% idle and aggregate CPU usage is 100% > >>>>> (should be 200%). This is completely deterministic. > >>>>> > >>>>> Going back to 4.3.0 resolves the problems. > >>>> > >>>> A git bisect identified the commit below as the culprit. > >>>> I've confirmed that reverting it from 4.4-rc7 solves the problem. > >>>> > >>>> commit 52708d690b8be132ba9d294464625dbbdb9fa5df > >>>> Author: Nitin Gupta <nitin.m.gupta@xxxxxxxxxx> > >>>> Date: Mon Nov 2 16:30:24 2015 -0500 > >>>> > >>>> sparc64: Fix numa distance values > >>>> > >>>> Orabug: 21896119 > >>>> > >>>> Use machine descriptor (MD) to get node latency > >>>> values instead of just using default values. > >>>> > >>>> Testing: > >>>> On an T5-8 system with: > >>>> - total nodes = 8 > >>>> - self latencies = 0x26d18 > >>>> - latency to other nodes = 0x3a598 > >>>> => latency ratio = ~1.5 > >>>> > >>>> output of numactl --hardware > >>>> > >>>> - before fix: > >>>> > >>>> node distances: > >>>> node 0 1 2 3 4 5 6 7 > >>>> 0: 10 20 20 20 20 20 20 20 > >>>> 1: 20 10 20 20 20 20 20 20 > >>>> 2: 20 20 10 20 20 20 20 20 > >>>> 3: 20 20 20 10 20 20 20 20 > >>>> 4: 20 20 20 20 10 20 20 20 > >>>> 5: 20 20 20 20 20 10 20 20 > >>>> 6: 20 20 20 20 20 20 10 20 > >>>> 7: 20 20 20 20 20 20 20 10 > >>>> > >>>> - after fix: > >>>> > >>>> node distances: > >>>> node 0 1 2 3 4 5 6 7 > >>>> 0: 10 15 15 15 15 15 15 15 > >>>> 1: 15 10 15 15 15 15 15 15 > >>>> 2: 15 15 10 15 15 15 15 15 > >>>> 3: 15 15 15 10 15 15 15 15 > >>>> 4: 15 15 15 15 10 15 15 15 > >>>> 5: 15 15 15 15 15 10 15 15 > >>>> 6: 15 15 15 15 15 15 10 15 > >>>> 7: 15 15 15 15 15 15 15 10 > >>>> > >>>> Signed-off-by: Nitin Gupta <nitin.m.gupta@xxxxxxxxxx> > >>>> Reviewed-by: Chris Hyser <chris.hyser@xxxxxxxxxx> > >>>> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@xxxxxxxxxx> > >>>> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> > >>>> -- > >> > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe sparclinux" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe sparclinux" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html