> On Aug 10, 2023, at 23:29, Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > Hi All, > > This is v5 of a series to implement variable order, large folios for anonymous > memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBLE_THP"). > The objective of this is to improve performance by allocating larger chunks of > memory during anonymous page faults: > > 1) Since SW (the kernel) is dealing with larger chunks of memory than base > pages, there are efficiency savings to be had; fewer page faults, batched PTE > and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel > overhead. This should benefit all architectures. > 2) Since we are now mapping physically contiguous chunks of memory, we can take > advantage of HW TLB compression techniques. A reduction in TLB pressure > speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce > TLB entries; "the contiguous bit" (architectural) and HPA (uarch). > > This patch set deals with the SW side of things (1). (2) is being tackled in a > separate series. The new behaviour is hidden behind a new Kconfig switch, > LARGE_ANON_FOLIO, which is disabled by default. Although the eventual aim is to > enable it by default. > > My hope is that we are pretty much there with the changes at this point; > hopefully this is sufficient to get an initial version merged so that we can > scale up characterization efforts. Although they should not be merged until the > prerequisites are complete. These are in progress and tracked at [5]. > > This series is based on mm-unstable (ad3232df3e41). > > I'm going to be out on holiday from the end of today, returning on 29th > August. So responses will likely be patchy, as I'm terrified of posting > to list from my phone! > > > Testing > ------- > > This version adds patches to mm selftests so that the cow tests explicitly test > large anon folios, in the same way that thp is tested. When enabled you should > see something similar at the start of the test suite: > > # [INFO] detected large anon folio size: 32 KiB > > Then the following results are expected. The fails and skips are due to existing > issues in mm-unstable: > > # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0 > > Existing mm selftests reveal 1 regression in khugepaged tests when > LARGE_ANON_FOLIO is enabled: > > Run test: collapse_max_ptes_none (khugepaged:anon) > Maybe collapse with max_ptes_none exceeded.... Fail > Unexpected huge page > > I believe this is because khugepaged currently skips non-order-0 pages when > looking for collapse opportunities and should get fixed with the help of > DavidH's work to create a mechanism to precisely determine shared vs exclusive > pages. > > > Changes since v4 [4] > -------------------- > > - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64 > now uses the default order-3 size. I have moved this patch over to > the contpte series. > - Added "mm: Allow deferred splitting of arbitrary large anon folios" back > into series. I originally removed this at v2 to add to a separate series, > but that series has transformed significantly and it no longer fits, so > bringing it back here. > - Reintroduced dependency on set_ptes(); Originally dropped this at v2, but > set_ptes() is in mm-unstable now. > - Updated policy for when to allocate LAF; only fallback to order-0 if > MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rely on > sysfs's never/madvise/always knob. > - Fallback to order-0 whenever uffd is armed for the vma, not just when > uffd-wp is set on the pte. > - alloc_anon_folio() now returns `strucxt folio *`, where errors are encoded > with ERR_PTR(). > > The last 3 changes were proposed by Yu Zhao - thanks! > > > Changes since v3 [3] > -------------------- > > - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO. > - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded that a > sysctl is preferable but we will wait until real workload needs it. > - Fixed uninitialized `addr` on read fault path in do_anonymous_page(). > - Added mm selftests for large anon folios in cow test suite. > > > Changes since v2 [2] > -------------------- > > - Dropped commit "Allow deferred splitting of arbitrary large anon folios" > - Huang, Ying suggested the "batch zap" work (which I dropped from this > series after v1) is a prerequisite for merging FLXEIBLE_THP, so I've > moved the deferred split patch to a separate series along with the batch > zap changes. I plan to submit this series early next week. > - Changed folio order fallback policy > - We no longer iterate from preferred to 0 looking for acceptable policy > - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0 only > - Removed vma parameter from arch_wants_pte_order() > - Added command line parameter `flexthp_unhinted_max` > - clamps preferred order when vma hasn't explicitly opted-in to THP > - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is disabled > for process or system). > - Simplified implementation and integration with do_anonymous_page() > - Removed dependency on set_ptes() > > > Changes since v1 [1] > -------------------- > > - removed changes to arch-dependent vma_alloc_zeroed_movable_folio() > - replaced with arch-independent alloc_anon_folio() > - follows THP allocation approach > - no longer retry with intermediate orders if allocation fails > - fallback directly to order-0 > - remove folio_add_new_anon_rmap_range() patch > - instead add its new functionality to folio_add_new_anon_rmap() > - remove batch-zap pte mappings optimization patch > - remove enabler folio_remove_rmap_range() patch too > - These offer real perf improvement so will submit separately > - simplify Kconfig > - single FLEXIBLE_THP option, which is independent of arch > - depends on TRANSPARENT_HUGEPAGE > - when enabled default to max anon folio size of 64K unless arch > explicitly overrides > - simplify changes to do_anonymous_page(): > - no more retry loop > > > [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.roberts@xxxxxxx/ > [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts@xxxxxxx/ > [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.roberts@xxxxxxx/ > [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@xxxxxxx/ > [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@xxxxxxx/ > > > Thanks, > Ryan > > Ryan Roberts (5): > mm: Allow deferred splitting of arbitrary large anon folios > mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() > mm: LARGE_ANON_FOLIO for improved performance > selftests/mm/cow: Generalize do_run_with_thp() helper > selftests/mm/cow: Add large anon folio tests > > include/linux/pgtable.h | 13 ++ > mm/Kconfig | 10 ++ > mm/memory.c | 144 +++++++++++++++++-- > mm/rmap.c | 31 +++-- > tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++--------- > 5 files changed, 347 insertions(+), 80 deletions(-) > > -- > 2.25.1 > I know Ryan is away currently, but as I can’t find the base commit mentioned in the cover letter to be based off of can anybody point me to it so I can use b4 for applying the series and test? Thanks, Itaru.