Hi, On Tue, Sep 22, 2020 at 02:54:01PM +0800 Huang Ying wrote: > Now, AutoNUMA can only optimize the page placement among the NUMA nodes if the > default memory policy is used. Because the memory policy specified explicitly > should take precedence. But this seems too strict in some situations. For > example, on a system with 4 NUMA nodes, if the memory of an application is bound > to the node 0 and 1, AutoNUMA can potentially migrate the pages between the node > 0 and 1 to reduce cross-node accessing without breaking the explicit memory > binding policy. > > So in this patch, if mbind(.mode=MPOL_BIND, .flags=MPOL_MF_LAZY) is used to bind > the memory of the application to multiple nodes, and in the hint page fault > handler both the faulting page node and the accessing node are in the policy > nodemask, the page will be tried to be migrated to the accessing node to reduce > the cross-node accessing. > Do you have any performance numbers that show the effects of this on a workload? > [Peter Zijlstra: provided the simplified implementation method.] > > Questions: > > Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA optimizing > globally. But for the memory areas that are bound to multiple NUMA nodes, even > if the AutoNUMA is enabled globally via the sysctl knob, we still need to enable > AutoNUMA again with a special flag. Why not just optimize the page placement if > possible as long as AutoNUMA is enabled globally? The interface would look > simpler with that. I agree. I think it should try to do this if globally enabled. > > Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx> > Cc: Dave Hansen <dave.hansen@xxxxxxxxx> > Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > --- > mm/mempolicy.c | 17 +++++++++++------ > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index eddbe4e56c73..273969204732 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -2494,15 +2494,19 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long > break; > > case MPOL_BIND: > - > /* > - * allows binding to multiple nodes. > - * use current page if in policy nodemask, > - * else select nearest allowed node, if any. > - * If no allowed nodes, use current [!misplaced]. > + * Allows binding to multiple nodes. If both current and > + * accessing nodes are in policy nodemask, migrate to > + * accessing node to optimize page placement. Otherwise, > + * use current page if in policy nodemask, else select > + * nearest allowed node, if any. If no allowed nodes, use > + * current [!misplaced]. > */ > - if (node_isset(curnid, pol->v.nodes)) > + if (node_isset(curnid, pol->v.nodes)) { > + if (node_isset(thisnid, pol->v.nodes)) > + goto moron; Nice label :) > goto out; > + } > z = first_zones_zonelist( > node_zonelist(numa_node_id(), GFP_HIGHUSER), > gfp_zone(GFP_HIGHUSER), > @@ -2516,6 +2520,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long > > /* Migrate the page towards the node whose CPU is referencing it */ > if (pol->flags & MPOL_F_MORON) { > +moron: > polnid = thisnid; > > if (!should_numa_migrate_memory(current, page, curnid, thiscpu)) > -- > 2.28.0 > Cheers, Phil --