On Mon, Jun 11, 2018 at 06:07:58PM +0200, Jirka Hladky wrote: > > > > Fixing any part of it for STREAM will end up regressing something else. > > > I fully understand that. We run a set of benchmarks and we always look at > the results as the ensemble. Looking only at one benchmark would be > completely wrong. > Indeed > And in fact, we do see regression on NAS benchmark going from 4.16 to 4.17 > kernel as well. On 4 NUMA node server with Xeon Gold CPUs we see the > regression around 26% for ft_C, 35% for mg_C_x and 25% for sp_C_x. The > biggest regression is with 32 threads (the box has 96 CPUs in total). I > have not yet tried if it's > linked to 2c83362734dad8e48ccc0710b5cd2436a0323893. I will do that > testing tomorrow. > It would be worthwhile. However, it's also worth noting that 32 threads out of 96 implies that 4 nodes would not be evenly used and it may account for some of the discrepency. ft and mg for C class are typically short-lived on modern hardware and sp is not particularly long-lived either. Hence, they are most likely to see problems with a patch that avoids spreading tasks across the machine early. Admittedly, I have not seen similar slowdowns but NAS has a lot of configuration options. In terms of the speed of migration, it may be worth checking how often the mm_numa_migrate_ratelimit tracepoint is triggered with bonus points for using the nr_pages to calculate how many pages get throttled from migrating. If it's high frequency then you could test increasing ratelimit_pages (which is set at compile time despite not being a macro). It still may not work for tasks that are too short-lived to have enough time to identify a misplacement and migration. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html