On Fri, Jun 08, 2018 at 01:02:54PM +0200, Jirka Hladky wrote: > > > > Unknown and unknowable. It depends entirely on the reference pattern of > > the different threads. If they are fully parallelised with private buffers > > that are page-aligned then I expect it to be quick (to pass the 2-reference > > filter). > > > I'm running 20 parallel processes. There is no connection between them. If > I read it correctly the migration should happen fast in this case, right? > > I have checked the source code and variables are global and static (and > thus allocated in the data segment). They are NOT 4k aligned: > > variable a is at address: 0x9e999e0 > variable b is at address: 0x524e5e0 > variable c is at address: 0x6031e0 > > static double a[N], > b[N], > c[N]; > If these are 20 completely indepent processes (and not sharing data via MPI if you're using that version of STREAM) then the migration should be relatively quick. Migrations should start within 3 seconds of the process starting. How long it takes depends on the size of the STREAM processes as it's only scanned in chunks and migrations won't start until there are two full passes of the address space. You can partially monitor the progress using /proc/pid/numa_maps. More detailed monitoring needs ftrace for some activity and the use of probes on specific functions to get detailed information. It may also be worth examining /proc/pid/sched and seeing if a task sets numa_preferred_nid to node 0 and keeps it there even after migrating to node 1 but that's doubtful. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html