Re: [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating only NULL elements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 07 Mar 2025, Yunsheng Lin wrote:
> On 2025/3/7 5:14, NeilBrown wrote:
> > On Thu, 06 Mar 2025, Yunsheng Lin wrote:
> >> On 2025/3/6 7:41, NeilBrown wrote:
> >>> On Wed, 05 Mar 2025, Yunsheng Lin wrote:
> >>>>
> >>>> For the existing btrfs and sunrpc case, I am agreed that there
> >>>> might be valid use cases too, we just need to discuss how to
> >>>> meet the requirements of different use cases using simpler, more
> >>>> unified and effective APIs.
> >>>
> >>> We don't need "more unified".
> >>
> >> What I meant about 'more unified' is how to avoid duplicated code as
> >> much as possible for two different interfaces with similar‌ functionality.
> >>
> >> The best way I tried to avoid duplicated code as much as possible is
> >> to defragment the page_array before calling the alloc_pages_bulk()
> >> for the use case of btrfs and sunrpc so that alloc_pages_bulk() can
> >> be removed of the assumption populating only NULL elements, so that
> >> the API is simpler and more efficient.
> >>
> >>>
> >>> If there are genuinely two different use cases with clearly different
> >>> needs - even if only slightly different - then it is acceptable to have
> >>> two different interfaces.  Be sure to choose names which emphasise the
> >>> differences.
> >>
> >> The best name I can come up with for the use case of btrfs and sunrpc
> >> is something like alloc_pages_bulk_refill(), any better suggestion about
> >> the naming?
> > 
> > I think alloc_pages_bulk_refill() is a good name.
> > 
> > So:
> > - alloc_pages_bulk() would be given an uninitialised array of page
> >   pointers and a required count and would return the number of pages
> >   that were allocated
> > - alloc_pages_bulk_refill() would be given an initialised array of page
> >   pointers some of which might be NULL.  It would attempt to allocate
> >   pages for the non-NULL pointers and return the total number of
> 
> You meant 'NULL pointers' instead of 'non-NULL pointers' above?

Correct - thanks.

> 
> >   allocated pages in the array - just like the current
> >   alloc_pages_bulk().
> 
> I guess 'the total number of allocated pages in the array ' include
> the pages which are already in the array before calling the above
> API?

Yes - just what the current function does.
Though I don't know that we really need that detail.
I think there are three interesting return values:

- hard failure - don't bother trying again soon:   maybe -ENOMEM
- success - all pages are allocated:  maybe 0 (or 1?)
- partial success - at least one page allocated, ok to try again
  immediately - maybe -EAGAIN (or 0).

> 
> I guess it is worth mentioning that the current alloc_pages_bulk()
> may return different value with the same size of arrays, but with
> different layout of the same number of NULL pointers.
> For the same size of arrays with different layout for the NULL pointer
> below('*' indicate NULL pointer), and suppose buddy allocator is only
> able to allocate two pages:
> 1. P**P*P: will return 4.
> 2. P*PP**: will return 5.

I guess the documentation for the function is wrong then.
They lends weight to the suggestion that the current return value isn't
ideal.

> 
> If the new API do the page defragmentation, then it will always return
> the same value for different layout of the same number of NULL pointers.
> I guess the new one is the more perfered behavior as it provides a more
> defined semantic.
> 
> > 
> > sunrpc could usefully use both of these interfaces.
> > 
> > alloc_pages_bulk() could be implemented by initialising the array and
> > then calling alloc_pages_bulk_refill().  Or alloc_pages_bulk_refill()
> > could be implemented by compacting the pages and then calling
> > alloc_pages_bulk().
> > If we could duplicate the code and have two similar but different
> > functions.
> > 
> > The documentation for _refill() should make it clear that the pages
> > might get re-ordered.
> 
> Does 'the pages might get re-ordered' mean defragmenting the page_array?
> If yes, it makes sense to make it clear.

Correct.  Defragmentation is an option for implementing refill.  We
shouldn't promise to do that, but we should define the API in such a way
that it is allowed.

> 
> > 
> > Having looked at some of the callers I agree that the current interface
> > is not ideal for many of them, and that providing a simpler interface
> > would help.
> 
> +1
> 
> > 
> > Thanks,
> > NeilBrown
> 

If I were do work on this (and I'm not, so you don't have to follow my
ideas) I would separate the bulk_alloc into several inline functions and
combine them into the different interfaces that you want.  This will
result in duplicated object code without duplicated source code.  The
object code should be optimal.

The parts of the function are:
 - validity checks - fallback to single page allocation
 - select zone - fallback to single page allocation
 - allocate multiple pages in the zone and dispose of them
 - allocate a single page

The "dispose of them" is one of
  - add to a linked list
  - add to end of array
  - add to next hole in array

These three could be inline functions that the "allocate multiple pages"
and "allocate single page" functions call.  We can pass these as
function arguments and the compile will inline them.
I imagine these little function would take one page and return
a bool indicating if any more are wanted.

The three functions: alloc_bulk_array alloc_bulk_list
alloc_bulk_refill_array would each look like:

  validity checks: do we need to allocate anything?

  if want more than one page &&
     am allowed to do mulipage (e.g. not __GFP_ACCOUNT) &&
     zone = choose_zone() {
        alloc_multi_from_zone(zone, dispose_function)
  }
  if nothing allocated
     alloc_single_page(dispose_function)

Each would have a different dispose_function and the initial checks
would be quite different, as would the return value.

Thanks for working on this.

NeilBrown





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux