On 10/14/21 20:26, Michal Hocko wrote:
On Thu 14-10-21 18:59:14, Aneesh Kumar K.V wrote:
On 10/14/21 17:11, Michal Hocko wrote:
On Thu 14-10-21 15:58:29, Aneesh Kumar K.V wrote:
On 10/14/21 15:08, Michal Hocko wrote:
[...]
Besides that it would be really great to finish the discussion about the
usecase before suggesting a new userspace API.
Application would like to hint a preferred node for allocating memory
backing a va range and at the same time wants to avoid fallback to some set
of nodes (in the use case I am interested don't fall back to slow memory
nodes).
We do have means for that, right? You can set your memory policy and
then set the cpu afffinity to the node you want to allocate from
initially. You can migrate to a different cpu/node if this is not the
preferred affinity. Why is that not usable?
For the same reason you mentioned earlier, these nodes can be cpu less
nodes.
It would have been easier if you were explicit about the usecase rather
than let other guess.
Also think about extensibility. Say I want to allocate from a set of
nodes first before falling back to the rest of the nodemask? If you want
to add a new API then think of other potential usecases.
Describing the specific allocation details become hard with preferred node
being a nodemask. With the below interface
SYSCALL_DEFINE5(preferred_mbind, unsigned long, start, unsigned long, len,
const unsigned long __user *, preferred_nmask, const unsigned long __user
*, fallback_nmask,
unsigned long, maxnode)
{
1. The preferred node is the first node in the preferred node mask
2. Then we try to allocate from nodes present in the preferred node mask
which is closer to the first node in the preferred node mask
3. If the above fails, we try to allocate from nodes in the fallback node
mask which is closer to the first node in the preferred nodemask.
Isn't that too complicated? Do we have a real usecase for that?
No, I think this is a suboptimal interface. AFAIU you really want to
define a "home" node(s) rather than any policy. Home node would
effectively override the default local node whatever policy you have as
it makes sense whether you have MPOL_PREFERRED_MANY or MPOL_BIND.
yes. I did describe it as below in an earlier email
"We could do
set_mempolicy(MPOLD_PREFERRED, nodemask(nodeX)))
set_mempolicy(MPOLD_PREFFERED_EXTEND, nodemask(fallback nodemask for
above PREFERRED policy)) "
But I agree that restricting this to virtual address range is much
better. Now I am wondering whether a nodemask is any better than a
nodeid. The concept of home nodes is confusing when compared to home node.
What would be the meaning of multiple nodes in a home nodes concept?
Should we do
SYSCALL_DEFINE4(home_node_mbind, unsigned long, start, unsigned long, len,
unsigned long, home_node, unsigned long, flags)
the flags is kept for future extension if any.
I guess this home node will only apply w.r.t MPOL_BIND and
MPOL_PREFFERED_MANY policy for now?
Another potential interface would be set_nodeorder which would
explicitly set the allocation fallback ordering. Again agnostic of the
underlying memory policy. This would be more generic but the question is
whether this is not too generic and whether there are usecases for that.
I would suggest we wait for applications really wanting a fallback order
other than distance based one before adding this. Distance based
fallback order from a preferred node is well understood from application
point of view.
-aneesh