Re: [RFC PATCH 0/1] Buddy allocator like folio split

Yang Shi <shy828301@xxxxxxxxx> · Fri, 18 Oct 2024 12:44:13 -0700

On Fri, Oct 18, 2024 at 12:11 PM Zi Yan <ziy@xxxxxxxxxx> wrote:
>
> On 18 Oct 2024, at 14:42, David Hildenbrand wrote:
>
> > On 09.10.24 00:37, Zi Yan wrote:
> >> Hi all,
> >
> > Hi!
> >
> >>
> >> Matthew and I have discussed about a different way of splitting large
> >> folios. Instead of split one folio uniformly into the same order smaller
> >> ones, doing buddy allocator like split can reduce the total number of
> >> resulting folios, the amount of memory needed for multi-index xarray
> >> split, and keep more large folios after a split. In addition, both
> >> Hugh[1] and Ryan[2] had similar suggestions before.
> >>
> >> The patch is an initial implementation. It passes simple order-9 to
> >> lower order split tests for anonymous folios and pagecache folios.
> >> There are still a lot of TODOs to make it upstream. But I would like to gather
> >> feedbacks before that.
> >
> > Interesting, but I don't see any actual users besides the debug/test interface wired up.
>
> Right. I am working on it now, since two potential users, anon large folios
> and truncate, might need more sophisticated implementation to fully take
> advantage of this new split.
>
> For anon large folios, this might be open to debate, if only a subset of
> orders are enabled, I assume folio_split() can only split to smaller
> folios with the enabled orders. For example, to get one order-0 from
> an order-9, and only order-4 (64KB on x86) is enabled, folio_split()
> can only split the order-9 to 16 order-0s, 31 order-4s, unless we are
> OK with anon large folios with not enabled orders appear in the system.

For anon large folios, deferred split may be a problem too. The
deferred split is typically used to free the unmapped subpages by, for
example, MADV_DONTNEED. But we don't know which subpages are unmapped
without reading their _mapcount by iterating every subpages.

>
> For truncate, the example you give below is an easy one. For cases like
> punching from 3rd to 5th order-0 of a order-3, [O0, O0, __, __, __, O0, O0, O0],
> I am thinking which approach is better:
>
> 1. two folio_split()s,
>   1) split second order-1 from order-3, 2) split order-0 from the second order-2;
>
> 2. one folio_split() by making folio_split() to support arbitrary range split,
> so two steps in 1 can be done in one shot, which saves unmapping and remapping
> cost.
>
> Maybe I should go for 1 first as an easy route, but I still need an algorithm
> in truncate to figure out the way of calling folio_split()s.
>
> >
> > I assume ftruncate() / fallocate(PUNCH_HOLE) might be good use cases? For example, when punching 1M of a 2M folio, we can just leave a 1M folio in the pagecache.
>
> Yes, I am trying to make this work.
>
> >
> > Any other obvious users you have in mind?
>
> Presumably, folio_split() should replace all split_huge*() to reduce total
> number of folios after a split. But for swapcache folios, I need to figure
> out if swap system works well with buddy allocator like splits.
>
>
>
> Best Regards,
> Yan, Zi