On Tue 23-06-20 09:12:11, Ben Widawsky wrote: > On 20-06-23 13:20:48, Michal Hocko wrote: [...] > > It would be also great to provide a high level semantic description > > here. I have very quickly glanced through patches and they are not > > really trivial to follow with many incremental steps so the higher level > > intention is lost easily. > > > > Do I get it right that the default semantic is essentially > > - allocate page from the given nodemask (with __GFP_RETRY_MAYFAIL > > semantic) > > - fallback to numa unrestricted allocation with the default > > numa policy on the failure > > > > Or are there any usecases to modify how hard to keep the preference over > > the fallback? > > tl;dr is: yes, and no usecases. OK, then I am wondering why the change has to be so involved. Except for syscall plumbing the only real change to the allocator path would be something like static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) { /* Lower zones don't get a nodemask applied for MPOL_BIND */ if (unlikely(policy->mode == MPOL_BIND || policy->mode == MPOL_PREFERED_MANY) && apply_policy_zone(policy, gfp_zone(gfp)) && cpuset_nodemask_valid_mems_allowed(&policy->v.nodes)) return &policy->v.nodes; return NULL; } alloc_pages_current if (pol->mode == MPOL_INTERLEAVE) page = alloc_page_interleave(gfp, order, interleave_nodes(pol)); else { gfp_t gfp_attempt = gfp; /* * Make sure the first allocation attempt will try hard * but eventually fail without OOM killer or other * disruption before falling back to the full nodemask */ if (pol->mode == MPOL_PREFERED_MANY) gfp_attempt |= __GFP_RETRY_MAYFAIL; page = __alloc_pages_nodemask(gfp_attempt, order, policy_node(gfp, pol, numa_node_id()), policy_nodemask(gfp, pol)); if (!page && pol->mode == MPOL_PREFERED_MANY) page = __alloc_pages_nodemask(gfp, order, numa_node_id(), NULL); } return page; similar (well slightly more hairy) in alloc_pages_vma Or do I miss something that really requires more involved approach like building custom zonelists and other larger changes to the allocator? -- Michal Hocko SUSE Labs