Re: [PATCH -V2 2/2] autonuma: Migrate on fault among multiple bound nodes

Ben Widawsky <ben.widawsky@xxxxxxxxx> · Fri, 6 Nov 2020 07:55:03 -0800

On 20-11-06 15:28:59, Huang, Ying wrote:
> Mel Gorman <mgorman@xxxxxxx> writes:
> 
> > On Wed, Nov 04, 2020 at 01:36:58PM +0800, Huang, Ying wrote:
> >> But from another point of view, I suggest to remove the constraints of
> >> MPOL_F_MOF in the future.  If the overhead of AutoNUMA isn't acceptable,
> >> why not just disable AutoNUMA globally via sysctl knob?
> >> 
> >
> > Because it's a double edged sword. NUMA Balancing can make a workload
> > faster while still incurring more overhead than it should -- particularly
> > when threads are involved rescanning the same or unrelated regions.
> > Global disabling only really should happen when an application is running
> > that is the only application on the machine and has full NUMA awareness.
> 
> Got it.  So NUMA Balancing may in generally benefit some workloads but
> hurt some other workloads on one machine.  So we need a method to
> enable/disable NUMA Balancing for one workload.  Previously, this is
> done via the explicit NUMA policy.  If some explicit NUMA policy is
> specified, NUMA Balancing is disabled for the memory region or the
> thread.  And this can be reverted again for a memory region via
> MPOL_MF_LAZY.  It appears that we lacks MPOL_MF_LAZY for the thread yet.
> 
> >> > It might still end up being better but I was not aware of a
> >> > *realistic* workload that binds to multiple nodes
> >> > deliberately. Generally I expect if an application is binding, it's
> >> > binding to one local node.
> >> 
> >> Yes.  It's not popular configuration for now.  But for the memory
> >> tiering system with both DRAM and PMEM, the DRAM and PMEM in one socket
> >> will become 2 NUMA nodes.  To avoid too much cross-socket memory
> >> accessing, but take advantage of both the DRAM and PMEM, the workload
> >> can be bound to 2 NUMA nodes (DRAM and PMEM).
> >> 
> >
> > Ok, that may lead to unpredictable performance as it'll have variable
> > performance with limited control of the "important" applications that
> > should use DRAM over PMEM. That's a long road but the step is not
> > incompatible with the long-term goal.
> 
> Yes.  Ben Widawsky is working on a patchset to make it possible to
> prefer the remote DRAM instead of the local PMEM as follows,
> 
> https://lore.kernel.org/linux-mm/20200630212517.308045-1-ben.widawsky@xxxxxxxxx/
> 
> Best Regards,
> Huang, Ying
> 

Rebased version was posted here:
https://lore.kernel.org/linux-mm/20201030190238.306764-1-ben.widawsky@xxxxxxxxx/

Thanks.
Ben