On 20-06-24 22:42:32, Michal Hocko wrote: > On Wed 24-06-20 13:23:44, Ben Widawsky wrote: > > On 20-06-24 22:07:50, Michal Hocko wrote: > > > On Wed 24-06-20 13:01:40, Ben Widawsky wrote: > > > > On 20-06-24 21:51:58, Michal Hocko wrote: > > > > > On Wed 24-06-20 12:37:33, Ben Widawsky wrote: > > > > > > On 20-06-24 20:39:17, Michal Hocko wrote: > > > > > > > On Wed 24-06-20 09:16:43, Ben Widawsky wrote: > > > [...] > > > > > > > > > Or do I miss something that really requires more involved approach like > > > > > > > > > building custom zonelists and other larger changes to the allocator? > > > > > > > > > > > > > > > > I think I'm missing how this allows selecting from multiple preferred nodes. In > > > > > > > > this case when you try to get the page from the freelist, you'll get the > > > > > > > > zonelist of the preferred node, and when you actually scan through on page > > > > > > > > allocation, you have no way to filter out the non-preferred nodes. I think the > > > > > > > > plumbing of multiple nodes has to go all the way through > > > > > > > > __alloc_pages_nodemask(). But it's possible I've missed the point. > > > > > > > > > > > > > > policy_nodemask() will provide the nodemask which will be used as a > > > > > > > filter on the policy_node. > > > > > > > > > > > > Ah, gotcha. Enabling independent masks seemed useful. Some bad decisions got me > > > > > > to that point. UAPI cannot get independent masks, and callers of these functions > > > > > > don't yet use them. > > > > > > > > > > > > So let me ask before I actually type it up and find it's much much simpler, is > > > > > > there not some perceived benefit to having both masks being independent? > > > > > > > > > > I am not sure I follow. Which two masks do you have in mind? zonelist > > > > > and user provided nodemask? > > > > > > > > Internally, a nodemask_t for preferred node, and a nodemask_t for bound nodes. > > > > > > Each mask is a local to its policy object. > > > > I mean for __alloc_pages_nodemask as an internal API. That is irrespective of > > policy. Policy decisions are all made beforehand. The question from a few mails > > ago was whether there is any use in keeping that change to > > __alloc_pages_nodemask accepting two nodemasks. > > It is probably too late for me because I am still not following you > mean. Maybe it would be better to provide a pseudo code what you have in > mind. Anyway all that I am saying is that for the functionality that you > propose and _if_ the fallback strategy is fixed then all you should need > is to use the preferred nodemask for the __alloc_pages_nodemask and a > fallback allocation to the full (NULL nodemask). So you first try what > the userspace prefers - __GFP_RETRY_MAYFAIL will give you try hard but > do not OOM if the memory is depleted semantic and the fallback > allocation goes all the way to OOM on the complete memory depletion. > So I do not see much point in a custom zonelist for the policy. Maybe as > a micro-optimization to save some branches here and there. > > If you envision usecases which might want to control the fallback > allocation strategy then this would get more complex because you > would need a sorted list of zones to try but this would really require > some solid usecase and it should build on top of a trivial > implementation which really is BIND with the fallback. > I will implement what you suggest. I think it's a good suggestion. Here is what I mean though: -struct page * -__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, - nodemask_t *nodemask); +struct page * +__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, nodemask_t *prefmask, + nodemask_t *nodemask); Is there any value in keeping two nodemasks as part of the interface?