On Wed, Jun 6, 2018 at 2:27 PM, Jakub Racek <jracek@xxxxxxxxxx> wrote: > Hi, > > There is a huge performance regression on the 2 and 4 NUMA node systems on > stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack > and NAS parallel benchmarks show upto 50% performance drop. > > When running for example 20 stream processes in parallel, we see the > following behavior: > > * all processes are started at NODE #1 > * memory is also allocated on NODE #1 > * roughly half of the processes are moved to the NODE #0 very quickly. * > however, memory is not moved to NODE #0 and stays allocated on NODE #1 > > As the result, half of the processes are running on NODE#0 with memory being > still allocated on NODE#1. This leads to non-local memory accesses > on the high Remote-To-Local Memory Access Ratio on the numatop charts. > So it seems that 4.17 is not doing a good job to move the memory to the > right NUMA > node after the process has been moved. > > ----8<---- > > The above is an excerpt from performance testing on 4.16 and 4.17 kernels. > > For now I'm merely making sure the problem is reported. OK, and why do you think that it is related to ACPI? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html