On Thu 14-10-21 21:20:51, Aneesh Kumar K.V wrote: > On 10/14/21 20:26, Michal Hocko wrote: > > On Thu 14-10-21 18:59:14, Aneesh Kumar K.V wrote: > > > On 10/14/21 17:11, Michal Hocko wrote: > > > > On Thu 14-10-21 15:58:29, Aneesh Kumar K.V wrote: > > > > > On 10/14/21 15:08, Michal Hocko wrote: > > > > [...] > > > > > > Besides that it would be really great to finish the discussion about the > > > > > > usecase before suggesting a new userspace API. > > > > > > > > > > > > > > > > Application would like to hint a preferred node for allocating memory > > > > > backing a va range and at the same time wants to avoid fallback to some set > > > > > of nodes (in the use case I am interested don't fall back to slow memory > > > > > nodes). > > > > > > > > We do have means for that, right? You can set your memory policy and > > > > then set the cpu afffinity to the node you want to allocate from > > > > initially. You can migrate to a different cpu/node if this is not the > > > > preferred affinity. Why is that not usable? > > > > > > For the same reason you mentioned earlier, these nodes can be cpu less > > > nodes. > > > > It would have been easier if you were explicit about the usecase rather > > than let other guess. > > > > > > Also think about extensibility. Say I want to allocate from a set of > > > > nodes first before falling back to the rest of the nodemask? If you want > > > > to add a new API then think of other potential usecases. > > > > > > > > > > Describing the specific allocation details become hard with preferred node > > > being a nodemask. With the below interface > > > > > > SYSCALL_DEFINE5(preferred_mbind, unsigned long, start, unsigned long, len, > > > const unsigned long __user *, preferred_nmask, const unsigned long __user > > > *, fallback_nmask, > > > unsigned long, maxnode) > > > { > > > > > > > > > 1. The preferred node is the first node in the preferred node mask > > > 2. Then we try to allocate from nodes present in the preferred node mask > > > which is closer to the first node in the preferred node mask > > > 3. If the above fails, we try to allocate from nodes in the fallback node > > > mask which is closer to the first node in the preferred nodemask. > > > > > > Isn't that too complicated? Do we have a real usecase for that? > > > > No, I think this is a suboptimal interface. AFAIU you really want to > > define a "home" node(s) rather than any policy. Home node would > > effectively override the default local node whatever policy you have as > > it makes sense whether you have MPOL_PREFERRED_MANY or MPOL_BIND. > > > > > yes. I did describe it as below in an earlier email > > "We could do > set_mempolicy(MPOLD_PREFERRED, nodemask(nodeX))) > set_mempolicy(MPOLD_PREFFERED_EXTEND, nodemask(fallback nodemask for above > PREFERRED policy)) " > > But I agree that restricting this to virtual address range is much better. > Now I am wondering whether a nodemask is any better than a nodeid. The > concept of home nodes is confusing when compared to home node. > What would be the meaning of multiple nodes in a home nodes concept? If you go with a nodemask then I expect we will hit ordering requirement very quickly. A single home node for a range makes some sense to me for the cpu less nodes. I do not see why somebody might require them to be the first one to consider but I can imagine there might be some (semi)reasonable usecases out there. In any case, implementation wise this shouldn't be really restricted to any specific memorly policy and only override the local node when we use it currently. > Should we do > > SYSCALL_DEFINE4(home_node_mbind, unsigned long, start, unsigned long, len, > unsigned long, home_node, unsigned long, flags) > > > the flags is kept for future extension if any. > > > I guess this home node will only apply w.r.t MPOL_BIND and > MPOL_PREFFERED_MANY policy for now? Why to constrain that artificially. Interleaving has to start somewhere as well, right? Not that it matters much in practice as only the first allocation would be affected. > > Another potential interface would be set_nodeorder which would > > explicitly set the allocation fallback ordering. Again agnostic of the > > underlying memory policy. This would be more generic but the question is > > whether this is not too generic and whether there are usecases for that. > > > > I would suggest we wait for applications really wanting a fallback order > other than distance based one before adding this. Distance based fallback > order from a preferred node is well understood from application point of > view. Right, I am not pushing into that direction. The idea was that unlike home node this has more potential extensibility as a single home node cannot capture preferences for more nodes for example. -- Michal Hocko SUSE Labs