> On Aug 16, 2023, at 18:25, Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote: > > > >> On 8/16/2023 4:11 PM, Itaru Kitayama wrote: >> >> >>>> On Aug 10, 2023, at 23:29, Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >>> >>> Hi All, >>> >>> This is v5 of a series to implement variable order, large folios for anonymous >>> memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBLE_THP"). >>> The objective of this is to improve performance by allocating larger chunks of >>> memory during anonymous page faults: >>> >>> 1) Since SW (the kernel) is dealing with larger chunks of memory than base >>> pages, there are efficiency savings to be had; fewer page faults, batched PTE >>> and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel >>> overhead. This should benefit all architectures. >>> 2) Since we are now mapping physically contiguous chunks of memory, we can take >>> advantage of HW TLB compression techniques. A reduction in TLB pressure >>> speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce >>> TLB entries; "the contiguous bit" (architectural) and HPA (uarch). >>> >>> This patch set deals with the SW side of things (1). (2) is being tackled in a >>> separate series. The new behaviour is hidden behind a new Kconfig switch, >>> LARGE_ANON_FOLIO, which is disabled by default. Although the eventual aim is to >>> enable it by default. >>> >>> My hope is that we are pretty much there with the changes at this point; >>> hopefully this is sufficient to get an initial version merged so that we can >>> scale up characterization efforts. Although they should not be merged until the >>> prerequisites are complete. These are in progress and tracked at [5]. >>> >>> This series is based on mm-unstable (ad3232df3e41). >>> >>> I'm going to be out on holiday from the end of today, returning on 29th >>> August. So responses will likely be patchy, as I'm terrified of posting >>> to list from my phone! >>> >>> >>> Testing >>> ------- >>> >>> This version adds patches to mm selftests so that the cow tests explicitly test >>> large anon folios, in the same way that thp is tested. When enabled you should >>> see something similar at the start of the test suite: >>> >>> # [INFO] detected large anon folio size: 32 KiB >>> >>> Then the following results are expected. The fails and skips are due to existing >>> issues in mm-unstable: >>> >>> # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0 >>> >>> Existing mm selftests reveal 1 regression in khugepaged tests when >>> LARGE_ANON_FOLIO is enabled: >>> >>> Run test: collapse_max_ptes_none (khugepaged:anon) >>> Maybe collapse with max_ptes_none exceeded.... Fail >>> Unexpected huge page >>> >>> I believe this is because khugepaged currently skips non-order-0 pages when >>> looking for collapse opportunities and should get fixed with the help of >>> DavidH's work to create a mechanism to precisely determine shared vs exclusive >>> pages. >>> >>> >>> Changes since v4 [4] >>> -------------------- >>> >>> - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64 >>> now uses the default order-3 size. I have moved this patch over to >>> the contpte series. >>> - Added "mm: Allow deferred splitting of arbitrary large anon folios" back >>> into series. I originally removed this at v2 to add to a separate series, >>> but that series has transformed significantly and it no longer fits, so >>> bringing it back here. >>> - Reintroduced dependency on set_ptes(); Originally dropped this at v2, but >>> set_ptes() is in mm-unstable now. >>> - Updated policy for when to allocate LAF; only fallback to order-0 if >>> MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rely on >>> sysfs's never/madvise/always knob. >>> - Fallback to order-0 whenever uffd is armed for the vma, not just when >>> uffd-wp is set on the pte. >>> - alloc_anon_folio() now returns `strucxt folio *`, where errors are encoded >>> with ERR_PTR(). >>> >>> The last 3 changes were proposed by Yu Zhao - thanks! >>> >>> >>> Changes since v3 [3] >>> -------------------- >>> >>> - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO. >>> - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded that a >>> sysctl is preferable but we will wait until real workload needs it. >>> - Fixed uninitialized `addr` on read fault path in do_anonymous_page(). >>> - Added mm selftests for large anon folios in cow test suite. >>> >>> >>> Changes since v2 [2] >>> -------------------- >>> >>> - Dropped commit "Allow deferred splitting of arbitrary large anon folios" >>> - Huang, Ying suggested the "batch zap" work (which I dropped from this >>> series after v1) is a prerequisite for merging FLXEIBLE_THP, so I've >>> moved the deferred split patch to a separate series along with the batch >>> zap changes. I plan to submit this series early next week. >>> - Changed folio order fallback policy >>> - We no longer iterate from preferred to 0 looking for acceptable policy >>> - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0 only >>> - Removed vma parameter from arch_wants_pte_order() >>> - Added command line parameter `flexthp_unhinted_max` >>> - clamps preferred order when vma hasn't explicitly opted-in to THP >>> - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is disabled >>> for process or system). >>> - Simplified implementation and integration with do_anonymous_page() >>> - Removed dependency on set_ptes() >>> >>> >>> Changes since v1 [1] >>> -------------------- >>> >>> - removed changes to arch-dependent vma_alloc_zeroed_movable_folio() >>> - replaced with arch-independent alloc_anon_folio() >>> - follows THP allocation approach >>> - no longer retry with intermediate orders if allocation fails >>> - fallback directly to order-0 >>> - remove folio_add_new_anon_rmap_range() patch >>> - instead add its new functionality to folio_add_new_anon_rmap() >>> - remove batch-zap pte mappings optimization patch >>> - remove enabler folio_remove_rmap_range() patch too >>> - These offer real perf improvement so will submit separately >>> - simplify Kconfig >>> - single FLEXIBLE_THP option, which is independent of arch >>> - depends on TRANSPARENT_HUGEPAGE >>> - when enabled default to max anon folio size of 64K unless arch >>> explicitly overrides >>> - simplify changes to do_anonymous_page(): >>> - no more retry loop >>> >>> >>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.roberts@xxxxxxx/ >>> [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts@xxxxxxx/ >>> [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.roberts@xxxxxxx/ >>> [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@xxxxxxx/ >>> [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@xxxxxxx/ >>> >>> >>> Thanks, >>> Ryan >>> >>> Ryan Roberts (5): >>> mm: Allow deferred splitting of arbitrary large anon folios >>> mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() >>> mm: LARGE_ANON_FOLIO for improved performance >>> selftests/mm/cow: Generalize do_run_with_thp() helper >>> selftests/mm/cow: Add large anon folio tests >>> >>> include/linux/pgtable.h | 13 ++ >>> mm/Kconfig | 10 ++ >>> mm/memory.c | 144 +++++++++++++++++-- >>> mm/rmap.c | 31 +++-- >>> tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++--------- >>> 5 files changed, 347 insertions(+), 80 deletions(-) >>> >>> -- >>> 2.25.1 >>> >> >> I know Ryan is away currently, but as I can’t find the base commit mentioned in the cover letter to be based off of can anybody point me to it so I can use b4 for applying the series and test? >> > Ryan mentioned: This series is based on mm-unstable (ad3232df3e41). Couldn’t find the commit in the mm-unstable branch I checked out today. I’m trying to use Andrew’s mm tree for the first time in a decade so I’m doing something wrong though. > > I believe you can apply the patchset to latest mm-unstable. Okay. Will try that. Thanks, Itaru. > > > Regards > Yin, Fengwei > >> Thanks, >> Itaru.