Hi Baolin, On Tue, Jun 04, 2024 at 06:17:44PM +0800, Baolin Wang wrote: > Anonymous pages have already been supported for multi-size (mTHP) allocation > through commit 19eaf44954df, that can allow THP to be configured through the > sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'. > > However, the anonymous shmem will ignore the anonymous mTHP rule configured > through the sysfs interface, and can only use the PMD-mapped THP, that is not > reasonable. Many implement anonymous page sharing through mmap(MAP_SHARED | > MAP_ANONYMOUS), especially in database usage scenarios, therefore, users expect > to apply an unified mTHP strategy for anonymous pages, also including the > anonymous shared pages, in order to enjoy the benefits of mTHP. For example, > lower latency than PMD-mapped THP, smaller memory bloat than PMD-mapped THP, > contiguous PTEs on ARM architecture to reduce TLB miss etc. > > As discussed in the bi-weekly MM meeting[1], the mTHP controls should control > all of shmem, not only anonymous shmem, but support will be added iteratively. > Therefore, this patch set starts with support for anonymous shmem. > > The primary strategy is similar to supporting anonymous mTHP. Introduce > a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled', > which can have almost the same values as the top-level > '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new > additional "inherit" option and dropping the testing options 'force' and > 'deny'. By default all sizes will be set to "never" except PMD size, which > is set to "inherit". This ensures backward compatibility with the anonymous > shmem enabled of the top level, meanwhile also allows independent control of > anonymous shmem enabled for each mTHP. > > Use the page fault latency tool to measure the performance of 1G anonymous shmem I'm not familiar with this tool. Could you share which repo/tool you are referring to? Also, are you running or are you aware of any other tools/tests available for shmem that we can use to make sure we do not introduce any regressions? Thanks! Daniel > with 32 threads on my machine environment with: ARM64 Architecture, 32 cores, > 125G memory: > base: mm-unstable > user-time sys_time faults_per_sec_per_cpu faults_per_sec > 0.04s 3.10s 83516.416 2669684.890 > > mm-unstable + patchset, anon shmem mTHP disabled > user-time sys_time faults_per_sec_per_cpu faults_per_sec > 0.02s 3.14s 82936.359 2630746.027 > > mm-unstable + patchset, anon shmem 64K mTHP enabled > user-time sys_time faults_per_sec_per_cpu faults_per_sec > 0.08s 0.31s 678630.231 17082522.495 > > From the data above, it is observed that the patchset has a minimal impact when > mTHP is not enabled (some fluctuations observed during testing). When enabling 64K > mTHP, there is a significant improvement of the page fault latency. > > [1] https://lore.kernel.org/all/f1783ff0-65bd-4b2b-8952-52b6822a0835@xxxxxxxxxx/ > > Changes from v3: > - Drop 'force' and 'deny' testing options for each mTHP. > - Use new helper update_mmu_tlb_range(), per Lance. > - Update documentation to drop "anonymous thp" terminology, per David. > - Initialize the 'suitable_orders' in shmem_alloc_and_add_folio(), > reported by kernel test robot. > - Fix the highest mTHP order in shmem_get_unmapped_area(). > - Update some commit message. > > Changes from v2: > - Rebased to mm/mm-unstable. > - Remove 'huge' parameter for shmem_alloc_and_add_folio(), per Lance. > > Changes from v1: > - Drop the patch that re-arranges the position of highest_order() and > next_order(), per Ryan. > - Modify the finish_fault() to fix VA alignment issue, per Ryan and > David. > - Fix some building issues, reported by Lance and kernel test robot. > - Update some commit message. > > Changes from RFC: > - Rebase the patch set against the new mm-unstable branch, per Lance. > - Add a new patch to export highest_order() and next_order(). > - Add a new patch to align mTHP size in shmem_get_unmapped_area(). > - Handle the uffd case and the VMA limits case when building mapping for > large folio in the finish_fault() function, per Ryan. > - Remove unnecessary 'order' variable in patch 3, per Kefeng. > - Keep the anon shmem counters' name consistency. > - Modify the strategy to support mTHP for anonymous shmem, discussed with > Ryan and David. > - Add reviewed tag from Barry. > - Update the commit message. > > Baolin Wang (6): > mm: memory: extend finish_fault() to support large folio > mm: shmem: add THP validation for PMD-mapped THP related statistics > mm: shmem: add multi-size THP sysfs interface for anonymous shmem > mm: shmem: add mTHP support for anonymous shmem > mm: shmem: add mTHP size alignment in shmem_get_unmapped_area > mm: shmem: add mTHP counters for anonymous shmem > > Documentation/admin-guide/mm/transhuge.rst | 23 ++ > include/linux/huge_mm.h | 23 ++ > mm/huge_memory.c | 17 +- > mm/memory.c | 57 +++- > mm/shmem.c | 344 ++++++++++++++++++--- > 5 files changed, 403 insertions(+), 61 deletions(-) > > -- > 2.39.3 >