Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy

"Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> · Thu, 14 Oct 2021 21:20:51 +0530

On 10/14/21 20:26, Michal Hocko wrote:
On Thu 14-10-21 18:59:14, Aneesh Kumar K.V wrote:
On 10/14/21 17:11, Michal Hocko wrote:
On Thu 14-10-21 15:58:29, Aneesh Kumar K.V wrote:
On 10/14/21 15:08, Michal Hocko wrote:
[...]
Besides that it would be really great to finish the discussion about the
usecase before suggesting a new userspace API.

Application would like to hint a preferred node for allocating memory
backing a va range and at the same time wants to avoid fallback to some set
of nodes (in the use case I am interested don't fall back to slow memory
nodes).

We do have means for that, right? You can set your memory policy and
then set the cpu afffinity to the node you want to allocate from
initially. You can migrate to a different cpu/node if this is not the
preferred affinity. Why is that not usable?

For the same reason you mentioned earlier, these nodes can be cpu less
nodes.

It would have been easier if you were explicit about the usecase rather
than let other guess.

Also think about extensibility. Say I want to allocate from a set of
nodes first before falling back to the rest of the nodemask? If you want
to add a new API then think of other potential usecases.

Describing the specific allocation details become hard with preferred node
being a nodemask. With the below interface

SYSCALL_DEFINE5(preferred_mbind, unsigned long, start, unsigned long, len,
		const unsigned long __user *, preferred_nmask, const unsigned long __user
*, fallback_nmask,
		unsigned long, maxnode)
{

1. The preferred node is the first node in the preferred node mask
2. Then we try to allocate from nodes present in the preferred node mask
which is closer to the first node in the preferred node mask
3. If the above fails, we try to allocate from nodes in the fallback node
mask which is closer to the first node in the preferred nodemask.

Isn't that too complicated? Do we have a real usecase for that?

No, I think this is a suboptimal interface. AFAIU you really want to
define a "home" node(s) rather than any policy. Home node would
effectively override the default local node whatever policy you have as
it makes sense whether you have MPOL_PREFERRED_MANY or MPOL_BIND.

yes. I did describe it as below in an earlier email

"We could do
set_mempolicy(MPOLD_PREFERRED, nodemask(nodeX)))
set_mempolicy(MPOLD_PREFFERED_EXTEND, nodemask(fallback nodemask for 
above PREFERRED policy)) "

But I agree that restricting this to virtual address range is much 
better. Now I am wondering whether a nodemask is any better than a 
nodeid. The concept of home nodes is confusing when compared to home node.
What would be the meaning of multiple nodes in a home nodes concept?

Should we do

SYSCALL_DEFINE4(home_node_mbind, unsigned long, start, unsigned long, len,
		unsigned long, home_node, unsigned long, flags)

the flags is kept for future extension if any.

I guess this home node will only apply w.r.t MPOL_BIND and 
MPOL_PREFFERED_MANY policy for now?

Another potential interface would be set_nodeorder which would
explicitly set the allocation fallback ordering. Again agnostic of the
underlying memory policy. This would be more generic but the question is
whether this is not too generic and whether there are usecases for that.

I would suggest we wait for applications really wanting a fallback order 
other than distance based one before adding this. Distance based 
fallback order from a preferred node is well understood from application 
point of view.

-aneesh