On 10/14/21 15:08, Michal Hocko wrote:
On Thu 14-10-21 15:00:22, Aneesh Kumar K.V wrote:
Michal Hocko <mhocko@xxxxxxxx> writes:
On Wed 13-10-21 18:53:55, Aneesh Kumar K.V wrote:
On 10/13/21 18:46, Andi Kleen wrote:
The difference with MPOL_BIND is the ability to specify a preferred node
which is the first node in the nodemask argument passed.
That's always the one with the lowest number. Isn't that quite limiting
in practice?
It seems if you really want to do that you would need another argument.
Yes. But that would make it a new syscall. Should we do that?
Yes, I do not see any reasonable to cram this into the existing syscall.
I am not yet sure what the syscall should look like though. I can see
two usecases, one of the is a very specific node allocation fallback
order requirement and another one is preferrence for a cpu less node
over other nodes. Both are slightly different.
How about
SYSCALL_DEFINE5(preferred_mbind, unsigned long, start, unsigned long, len,
unsigned long, preferred_node, const unsigned long __user *, nmask,
unsigned long, maxnode)
{
return kernel_mbind(start, len, MPOL_PREFERRED_STRICT, preferred_node,
nmask, maxnode, 0);
}
Semantic? How does it interact with MPOL_PREFERRED_MANY, MPOL_BIND and
other others?
This allows to specify a new memory policy for the va range. We are
forced to use a new syscall because of the limitation of the current
mbind(2) syscall. We could make a generic sys_mbind2(), but i was not
sure whether we need to make it that complex. mbind() is already a 6
argument syscall.
Besides that it would be really great to finish the discussion about the
usecase before suggesting a new userspace API.
Application would like to hint a preferred node for allocating memory
backing a va range and at the same time wants to avoid fallback to some
set of nodes (in the use case I am interested don't fall back to slow
memory nodes).
-aneesh