On Wed 13-10-21 18:05:49, Aneesh Kumar K.V wrote: > On 10/13/21 16:18, Michal Hocko wrote: > > On Wed 13-10-21 12:42:34, Michal Hocko wrote: > > > [Cc linux-api] > > > > > > On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote: > > > > This mempolicy mode can be used with either the set_mempolicy(2) > > > > or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it > > > > allows an application to set a preference node from which the kernel > > > > will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode, > > > > it takes a set of nodes. The nodes in the nodemask are used as fallback > > > > allocation nodes if memory is not available on the preferred node. > > > > Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations > > > > to all nodes in the system. Like the MPOL_BIND interface, it works over a > > > > set of nodes and will cause a SIGSEGV or invoke the OOM killer if > > > > memory is not available on those preferred nodes. > > > > > > > > This patch helps applications to hint a memory allocation preference node > > > > and fallback to _only_ a set of nodes if the memory is not available > > > > on the preferred node. Fallback allocation is attempted from the node which is > > > > nearest to the preferred node. > > > > > > > > This new memory policy helps applications to have explicit control on slow > > > > memory allocation and avoids default fallback to slow memory NUMA nodes. > > > > The difference with MPOL_BIND is the ability to specify a preferred node > > > > which is the first node in the nodemask argument passed. > > > > I am sorry but I do not understand the semantic diffrence from > > MPOL_BIND. Could you be more specific please? > > > > > > MPOL_BIND > This mode specifies that memory must come from the set of > nodes specified by the policy. Memory will be allocated from > the node in the set with sufficient free memory that is > closest to the node where the allocation takes place. > > > MPOL_PREFERRED_STRICT > This mode specifies that the allocation should be attempted > from the first node specified in the nodemask of the policy. > If that allocation fails, the kernel will search other nodes > in the nodemask, in order of increasing distance from the > preferred node based on information provided by the platform firmware. > > The difference is the ability to specify the preferred node as the first > node in the nodemask and all fallback allocations are based on the distance > from the preferred node. With MPOL_BIND they base based on the node where > the allocation takes place. OK, this makes it more clear. Thanks! I am still not sure the semantic makes sense though. Why should the lowest node in the nodemask have any special meaning? What if it is a node with a higher number that somebody preferes to start with? -- Michal Hocko SUSE Labs