Re: [RFC PATCH v2 2/3] mm/mempolicy: add set_mempolicy_home_node syscall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/21/21 13:02, Feng Tang wrote:
Hi Aneesh,

On Wed, Oct 20, 2021 at 02:54:52PM +0530, Aneesh Kumar K.V wrote:
This syscall can be used to set a home node for the MPOL_BIND
and MPOL_PREFERRED_MANY memory policy. Users should use this
syscall after setting up a memory policy for the specified range
as shown below.

mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,
	    new_nodes->size + 1, 0);
sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size,
				  home_node, 0);

The syscall allows specifying a home node/preferred node from which kernel
will fulfill memory allocation requests first.

For address range with MPOL_BIND memory policy, if nodemask specifies more
than one node, page allocations will come from the node in the nodemask
with sufficient free memory that is closest to the home node/preferred node.

For MPOL_PREFERRED_MANY if the nodemask specifies more than one node,
page allocation will come from the node in the nodemask with sufficient
free memory that is closest to the home node/preferred node. If there is
not enough memory in all the nodes specified in the nodemask, the allocation
will be attempted from the closest numa node to the home node in the system.

I can understand the requirement for MPOL_BIND, and for MPOL_PREFERRED_MANY,
it provides 3 levels of preference:
   home node --> preferred nodes --> all nodes
Any real usage cases for this? For a platform which may have 3 types of
memory (HBM, DRAM, PMEM), this may be useful.

The patch was based on a need to enable an application (that is already using MPOL_PREFERRED to hint a preference node) to run on a system with different types of memory (fast and slow memory).


This helps applications to hint at a memory allocation preference node
and fallback to _only_ a set of nodes if the memory is not available
on the preferred node.  Fallback allocation is attempted from the node which is
nearest to the preferred node.

This helps applications to have control on memory allocation numa nodes and
avoids default fallback to slow memory NUMA nodes. For example a system with
NUMA nodes 1,2 and 3 with DRAM memory and 10, 11 and 12 of slow memory

  new_nodes = numa_bitmask_alloc(nr_nodes);

  numa_bitmask_setbit(new_nodes, 1);
  numa_bitmask_setbit(new_nodes, 2);
  numa_bitmask_setbit(new_nodes, 3);

  p = mmap(NULL, nr_pages * page_size, protflag, mapflag, -1, 0);
  mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,  new_nodes->size + 1, 0);

  sys_set_mempolicy_home_node(p, nr_pages * page_size, 2, 0);
For this example, it's 'mbind + sys_set_mempolicy_home_node', will case
'set_mempolicy + sys_set_mempolicy_home_node' be also supported?


At this point it is not asked for. Hence the patch is looking up for vma policy to set the home node. If there is a need to set home node for a task, we can look at adding the same. I have kept flags variable, that should help us to accommodate such a request if we get one in the future.

-aneesh



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux