Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy

Michal Hocko <mhocko@xxxxxxxx> · Wed, 13 Oct 2021 14:50:22 +0200

On Wed 13-10-21 18:05:49, Aneesh Kumar K.V wrote:
> On 10/13/21 16:18, Michal Hocko wrote:
> > On Wed 13-10-21 12:42:34, Michal Hocko wrote:
> > > [Cc linux-api]
> > > 
> > > On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote:
> > > > This mempolicy mode can be used with either the set_mempolicy(2)
> > > > or mbind(2) interfaces.  Like the MPOL_PREFERRED interface, it
> > > > allows an application to set a preference node from which the kernel
> > > > will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode,
> > > > it takes a set of nodes. The nodes in the nodemask are used as fallback
> > > > allocation nodes if memory is not available on the preferred node.
> > > > Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations
> > > > to all nodes in the system. Like the MPOL_BIND interface, it works over a
> > > > set of nodes and will cause a SIGSEGV or invoke the OOM killer if
> > > > memory is not available on those preferred nodes.
> > > > 
> > > > This patch helps applications to hint a memory allocation preference node
> > > > and fallback to _only_ a set of nodes if the memory is not available
> > > > on the preferred node.  Fallback allocation is attempted from the node which is
> > > > nearest to the preferred node.
> > > > 
> > > > This new memory policy helps applications to have explicit control on slow
> > > > memory allocation and avoids default fallback to slow memory NUMA nodes.
> > > > The difference with MPOL_BIND is the ability to specify a preferred node
> > > > which is the first node in the nodemask argument passed.
> > 
> > I am sorry but I do not understand the semantic diffrence from
> > MPOL_BIND. Could you be more specific please?
> > 
> 
> 
> 
> MPOL_BIND
> 	This mode specifies that memory must come from the set of
> 	nodes specified by the policy.  Memory will be allocated from
> 	the node in the set with sufficient free memory that is
> 	closest to the node where the allocation takes place.
> 
> 
> MPOL_PREFERRED_STRICT
> 	This mode specifies that the allocation should be attempted
> 	from the first node specified in the nodemask of the policy.
> 	If that allocation fails, the kernel will search other nodes
> 	in the nodemask, in order of increasing distance from the
> 	preferred node based on information provided by the platform   firmware.
> 
> The difference is the ability to specify the preferred node as the first
> node in the nodemask and all fallback allocations are based on the distance
> from the preferred node. With MPOL_BIND they base based on the node where
> the allocation takes place.

OK, this makes it more clear. Thanks! 

I am still not sure the semantic makes sense though. Why should
the lowest node in the nodemask have any special meaning? What if it is
a node with a higher number that somebody preferes to start with?
-- 
Michal Hocko
SUSE Labs