Re: [PATCH 00/18] multiple preferred nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20-06-24 22:42:32, Michal Hocko wrote:
> On Wed 24-06-20 13:23:44, Ben Widawsky wrote:
> > On 20-06-24 22:07:50, Michal Hocko wrote:
> > > On Wed 24-06-20 13:01:40, Ben Widawsky wrote:
> > > > On 20-06-24 21:51:58, Michal Hocko wrote:
> > > > > On Wed 24-06-20 12:37:33, Ben Widawsky wrote:
> > > > > > On 20-06-24 20:39:17, Michal Hocko wrote:
> > > > > > > On Wed 24-06-20 09:16:43, Ben Widawsky wrote:
> > > [...]
> > > > > > > > > Or do I miss something that really requires more involved approach like
> > > > > > > > > building custom zonelists and other larger changes to the allocator?
> > > > > > > > 
> > > > > > > > I think I'm missing how this allows selecting from multiple preferred nodes. In
> > > > > > > > this case when you try to get the page from the freelist, you'll get the
> > > > > > > > zonelist of the preferred node, and when you actually scan through on page
> > > > > > > > allocation, you have no way to filter out the non-preferred nodes. I think the
> > > > > > > > plumbing of multiple nodes has to go all the way through
> > > > > > > > __alloc_pages_nodemask(). But it's possible I've missed the point.
> > > > > > > 
> > > > > > > policy_nodemask() will provide the nodemask which will be used as a
> > > > > > > filter on the policy_node.
> > > > > > 
> > > > > > Ah, gotcha. Enabling independent masks seemed useful. Some bad decisions got me
> > > > > > to that point. UAPI cannot get independent masks, and callers of these functions
> > > > > > don't yet use them.
> > > > > > 
> > > > > > So let me ask before I actually type it up and find it's much much simpler, is
> > > > > > there not some perceived benefit to having both masks being independent?
> > > > > 
> > > > > I am not sure I follow. Which two masks do you have in mind? zonelist
> > > > > and user provided nodemask?
> > > > 
> > > > Internally, a nodemask_t for preferred node, and a nodemask_t for bound nodes.
> > > 
> > > Each mask is a local to its policy object.
> > 
> > I mean for __alloc_pages_nodemask as an internal API. That is irrespective of
> > policy. Policy decisions are all made beforehand. The question from a few mails
> > ago was whether there is any use in keeping that change to
> > __alloc_pages_nodemask accepting two nodemasks.
> 
> It is probably too late for me because I am still not following you
> mean. Maybe it would be better to provide a pseudo code what you have in
> mind. Anyway all that I am saying is that for the functionality that you
> propose and _if_ the fallback strategy is fixed then all you should need
> is to use the preferred nodemask for the __alloc_pages_nodemask and a
> fallback allocation to the full (NULL nodemask). So you first try what
> the userspace prefers - __GFP_RETRY_MAYFAIL will give you try hard but
> do not OOM if the memory is depleted semantic and the fallback
> allocation goes all the way to OOM on the complete memory depletion.
> So I do not see much point in a custom zonelist for the policy. Maybe as
> a micro-optimization to save some branches here and there.
> 
> If you envision usecases which might want to control the fallback
> allocation strategy then this would get more complex because you
> would need a sorted list of zones to try but this would really require
> some solid usecase and it should build on top of a trivial
> implementation which really is BIND with the fallback.
> 

I will implement what you suggest. I think it's a good suggestion. Here is what
I mean though:
-struct page *
-__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
-                                                       nodemask_t *nodemask);
+struct page *
+__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, nodemask_t *prefmask,
+		       nodemask_t *nodemask);

Is there any value in keeping two nodemasks as part of the interface?




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux