On Fri, Oct 18, 2024 at 12:11 PM Zi Yan <ziy@xxxxxxxxxx> wrote: > > On 18 Oct 2024, at 14:42, David Hildenbrand wrote: > > > On 09.10.24 00:37, Zi Yan wrote: > >> Hi all, > > > > Hi! > > > >> > >> Matthew and I have discussed about a different way of splitting large > >> folios. Instead of split one folio uniformly into the same order smaller > >> ones, doing buddy allocator like split can reduce the total number of > >> resulting folios, the amount of memory needed for multi-index xarray > >> split, and keep more large folios after a split. In addition, both > >> Hugh[1] and Ryan[2] had similar suggestions before. > >> > >> The patch is an initial implementation. It passes simple order-9 to > >> lower order split tests for anonymous folios and pagecache folios. > >> There are still a lot of TODOs to make it upstream. But I would like to gather > >> feedbacks before that. > > > > Interesting, but I don't see any actual users besides the debug/test interface wired up. > > Right. I am working on it now, since two potential users, anon large folios > and truncate, might need more sophisticated implementation to fully take > advantage of this new split. > > For anon large folios, this might be open to debate, if only a subset of > orders are enabled, I assume folio_split() can only split to smaller > folios with the enabled orders. For example, to get one order-0 from > an order-9, and only order-4 (64KB on x86) is enabled, folio_split() > can only split the order-9 to 16 order-0s, 31 order-4s, unless we are > OK with anon large folios with not enabled orders appear in the system. For anon large folios, deferred split may be a problem too. The deferred split is typically used to free the unmapped subpages by, for example, MADV_DONTNEED. But we don't know which subpages are unmapped without reading their _mapcount by iterating every subpages. > > For truncate, the example you give below is an easy one. For cases like > punching from 3rd to 5th order-0 of a order-3, [O0, O0, __, __, __, O0, O0, O0], > I am thinking which approach is better: > > 1. two folio_split()s, > 1) split second order-1 from order-3, 2) split order-0 from the second order-2; > > 2. one folio_split() by making folio_split() to support arbitrary range split, > so two steps in 1 can be done in one shot, which saves unmapping and remapping > cost. > > Maybe I should go for 1 first as an easy route, but I still need an algorithm > in truncate to figure out the way of calling folio_split()s. > > > > > I assume ftruncate() / fallocate(PUNCH_HOLE) might be good use cases? For example, when punching 1M of a 2M folio, we can just leave a 1M folio in the pagecache. > > Yes, I am trying to make this work. > > > > > Any other obvious users you have in mind? > > Presumably, folio_split() should replace all split_huge*() to reduce total > number of folios after a split. But for swapcache folios, I need to figure > out if swap system works well with buddy allocator like splits. > > > > Best Regards, > Yan, Zi