On Thu 02-12-21 18:08:09, Aneesh Kumar K.V wrote: > This syscall can be used to set a home node for the MPOL_BIND > and MPOL_PREFERRED_MANY memory policy. Users should use this > syscall after setting up a memory policy for the specified range > as shown below. > > mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp, > new_nodes->size + 1, 0); > sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size, > home_node, 0); > > The syscall allows specifying a home node/preferred node from which kernel > will fulfill memory allocation requests first. > > For address range with MPOL_BIND memory policy, if nodemask specifies more > than one node, page allocations will come from the node in the nodemask > with sufficient free memory that is closest to the home node/preferred node. > > For MPOL_PREFERRED_MANY if the nodemask specifies more than one node, > page allocation will come from the node in the nodemask with sufficient > free memory that is closest to the home node/preferred node. If there is > not enough memory in all the nodes specified in the nodemask, the allocation > will be attempted from the closest numa node to the home node in the system. > > This helps applications to hint at a memory allocation preference node > and fallback to _only_ a set of nodes if the memory is not available > on the preferred node. Fallback allocation is attempted from the node which is > nearest to the preferred node. > > This helps applications to have control on memory allocation numa nodes and > avoids default fallback to slow memory NUMA nodes. For example a system with > NUMA nodes 1,2 and 3 with DRAM memory and 10, 11 and 12 of slow memory > > new_nodes = numa_bitmask_alloc(nr_nodes); > > numa_bitmask_setbit(new_nodes, 1); > numa_bitmask_setbit(new_nodes, 2); > numa_bitmask_setbit(new_nodes, 3); > > p = mmap(NULL, nr_pages * page_size, protflag, mapflag, -1, 0); > mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp, new_nodes->size + 1, 0); > > sys_set_mempolicy_home_node(p, nr_pages * page_size, 2, 0); > > This will allocate from nodes closer to node 2 and will make sure the kernel will > only allocate from nodes 1, 2, and 3. Memory will not be allocated from slow memory > nodes 10, 11, and 12. This differs from default MPOL_BIND behavior in that with > default MPOL_BIND the allocation will be attempted from node closer to the local node. > One of the reasons to specify a home node is to allow allocations from cpu less > NUMA node and its nearby NUMA nodes. > > With MPOL_PREFERRED_MANY on the other hand will first try to allocate from the > closest node to node 2 from the node list 1, 2 and 3. If those nodes don't have > enough memory, kernel will allocate from slow memory node 10, 11 and 12 which > ever is closer to node 2. > > Cc: Ben Widawsky <ben.widawsky@xxxxxxxxx> > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Cc: Feng Tang <feng.tang@xxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxx> > Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: Huang Ying <ying.huang@xxxxxxxxx> > Cc: linux-api@xxxxxxxxxxxxxxx > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> LGTM Acked-by: Michal Hocko <mhocko@xxxxxxxx> Thanks! -- Michal Hocko SUSE Labs