On 22/04/2024 08:02, Baolin Wang wrote: > Anonymous pages have already been supported for multi-size (mTHP) allocation > through commit 19eaf44954df, that can allow THP to be configured through the > sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'. > > However, the anonymous shared pages will ignore the anonymous mTHP rule > configured through the sysfs interface, and can only use the PMD-mapped > THP, that is not reasonable. Many implement anonymous page sharing through > mmap(MAP_SHARED | MAP_ANONYMOUS), especially in database usage scenarios, > therefore, users expect to apply an unified mTHP strategy for anonymous pages, > also including the anonymous shared pages, in order to enjoy the benefits of > mTHP. For example, lower latency than PMD-mapped THP, smaller memory bloat > than PMD-mapped THP, contiguous PTEs on ARM architecture to reduce TLB miss etc. This sounds like a very useful addition! Out of interest, can you point me at any workloads (and off-the-shelf benchmarks for those workloads) that predominantly use shared anon memory? > > The primary strategy is that, the use of huge pages for anonymous shared pages > still follows the global control determined by the mount option "huge=" parameter > or the sysfs interface at '/sys/kernel/mm/transparent_hugepage/shmem_enabled'. > The utilization of mTHP is allowed only when the global 'huge' switch is enabled. > Subsequently, the mTHP sysfs interface (/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled) > is checked to determine the mTHP size that can be used for large folio allocation > for these anonymous shared pages. I'm not sure about this proposed control mechanism; won't it break compatibility? I could be wrong, but I don't think shmem's use of THP used to depend upon the value of /sys/kernel/mm/transparent_hugepage/enabled? So it doesn't make sense to me that we now depend upon the /sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled values (which by default disables all sizes except 2M, which is set to "inherit" from /sys/kernel/mm/transparent_hugepage/enabled). The other problem is that shmem_enabled has a different set of options (always/never/within_size/advise/deny/force) to enabled (always/madvise/never) Perhaps it would be cleaner to do the same trick we did for enabled; Introduce /mm/transparent_hugepage/hugepage-XXkb/shmem_enabled, which can have all the same values as the top-level /sys/kernel/mm/transparent_hugepage/shmem_enabled, plus the additional "inherit" option. By default all sizes will be set to "never" except 2M, which is set to "inherit". Of course the huge= mount option would also need to take a per-size option in this case. e.g. huge=2048kB:advise,64kB:always > > TODO: > - More testing and provide some performance data. > - Need more discussion about the large folio allocation strategy for a 'regular > file' operation created by memfd_create(), for example using ftruncate(fd) to specify > the 'file' size, which need to follow the anonymous mTHP rule too? > - Do not split the large folio when share memory swap out. > - Can swap in a large folio for share memory. > > Baolin Wang (5): > mm: memory: extend finish_fault() to support large folio > mm: shmem: add an 'order' parameter for shmem_alloc_hugefolio() > mm: shmem: add THP validation for PMD-mapped THP related statistics > mm: shmem: add mTHP support for anonymous share pages > mm: shmem: add anonymous share mTHP counters > > include/linux/huge_mm.h | 4 +- > mm/huge_memory.c | 8 ++- > mm/memory.c | 25 +++++++--- > mm/shmem.c | 107 ++++++++++++++++++++++++++++++---------- > 4 files changed, 108 insertions(+), 36 deletions(-) >