Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> writes: >> >> >> >> if (lr_ratio >= NUMA_PERIOD_THRESHOLD) >> >> slow down scanning >> >> else if (sp_ratio >= NUMA_PERIOD_THRESHOLD) { >> >> if (NUMA_PERIOD_SLOTS - lr_ratio >= NUMA_PERIOD_THRESHOLD) >> >> speed up scanning >> >> Thought about this again. For example, a multi-threads workload runs on >> a 4-sockets machine, and most memory accesses are shared. The optimal >> situation will be pseudo-interleaving, that is, spreading memory >> accesses evenly among 4 NUMA nodes. Where "share" >> "private", and >> "remote" > "local". And we should slow down scanning to reduce the >> overhead. >> >> What do you think about this? > > If all 4 nodes have equal access, then all 4 nodes will be active nodes. > > From task_numa_fault() > > if (!priv && !local && ng && ng->active_nodes > 1 && > numa_is_active_node(cpu_node, ng) && > numa_is_active_node(mem_node, ng)) > local = 1; > > Hence all accesses will be accounted as local. Hence scanning would slow > down. Yes. You are right! Thanks a lot! There may be another case. For example, a workload with 9 threads runs on a 2-sockets machine, and most memory accesses are shared. 7 threads runs on the node 0 and 2 threads runs on the node 1 based on CPU load balancing. Then the 2 threads on the node 1 will have "share" >> "private" and "remote" >> "local". But it doesn't help to speed up scanning. Best Regards, Huang, Ying