Phil Auld <pauld@xxxxxxxxxx> writes: > Hi, > > On Tue, Sep 22, 2020 at 02:54:01PM +0800 Huang Ying wrote: >> Now, AutoNUMA can only optimize the page placement among the NUMA nodes if the >> default memory policy is used. Because the memory policy specified explicitly >> should take precedence. But this seems too strict in some situations. For >> example, on a system with 4 NUMA nodes, if the memory of an application is bound >> to the node 0 and 1, AutoNUMA can potentially migrate the pages between the node >> 0 and 1 to reduce cross-node accessing without breaking the explicit memory >> binding policy. >> >> So in this patch, if mbind(.mode=MPOL_BIND, .flags=MPOL_MF_LAZY) is used to bind >> the memory of the application to multiple nodes, and in the hint page fault >> handler both the faulting page node and the accessing node are in the policy >> nodemask, the page will be tried to be migrated to the accessing node to reduce >> the cross-node accessing. >> > > Do you have any performance numbers that show the effects of this on > a workload? I have done some simple test to confirm that NUMA balancing works in the target configuration. As for performance numbers, it's exactly same as that of the original NUMA balancing in a different configuration. Between without memory binding and with memory bound to all NUMA nodes. > >> [Peter Zijlstra: provided the simplified implementation method.] >> >> Questions: >> >> Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA optimizing >> globally. But for the memory areas that are bound to multiple NUMA nodes, even >> if the AutoNUMA is enabled globally via the sysctl knob, we still need to enable >> AutoNUMA again with a special flag. Why not just optimize the page placement if >> possible as long as AutoNUMA is enabled globally? The interface would look >> simpler with that. > > > I agree. I think it should try to do this if globally enabled. Thanks! >> >> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: Ingo Molnar <mingo@xxxxxxxxxx> >> Cc: Mel Gorman <mgorman@xxxxxxx> >> Cc: Rik van Riel <riel@xxxxxxxxxx> >> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> >> Cc: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx> >> Cc: Dave Hansen <dave.hansen@xxxxxxxxx> >> Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx> >> Cc: Michal Hocko <mhocko@xxxxxxxx> >> Cc: David Rientjes <rientjes@xxxxxxxxxx> >> --- >> mm/mempolicy.c | 17 +++++++++++------ >> 1 file changed, 11 insertions(+), 6 deletions(-) >> >> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> index eddbe4e56c73..273969204732 100644 >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -2494,15 +2494,19 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long >> break; >> >> case MPOL_BIND: >> - >> /* >> - * allows binding to multiple nodes. >> - * use current page if in policy nodemask, >> - * else select nearest allowed node, if any. >> - * If no allowed nodes, use current [!misplaced]. >> + * Allows binding to multiple nodes. If both current and >> + * accessing nodes are in policy nodemask, migrate to >> + * accessing node to optimize page placement. Otherwise, >> + * use current page if in policy nodemask, else select >> + * nearest allowed node, if any. If no allowed nodes, use >> + * current [!misplaced]. >> */ >> - if (node_isset(curnid, pol->v.nodes)) >> + if (node_isset(curnid, pol->v.nodes)) { >> + if (node_isset(thisnid, pol->v.nodes)) >> + goto moron; > > Nice label :) OK. Because quite some people pay attention to this. I will rename all "moron" to "mopron" as suggested by Matthew. Although MPOL_F_MORON is defined in include/uapi/linux/mempolicy.h, it is explicitly marked as internal flags. Best Regards, Huang, Ying >> goto out; >> + } >> z = first_zones_zonelist( >> node_zonelist(numa_node_id(), GFP_HIGHUSER), >> gfp_zone(GFP_HIGHUSER), >> @@ -2516,6 +2520,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long >> >> /* Migrate the page towards the node whose CPU is referencing it */ >> if (pol->flags & MPOL_F_MORON) { >> +moron: >> polnid = thisnid; >> >> if (!should_numa_migrate_memory(current, page, curnid, thiscpu)) >> -- >> 2.28.0 >> > > > Cheers, > Phil