From: Alexander Zhu <alexlzhu@xxxxxx> Transparent Hugepages use a larger page size of 2MB in comparison to normal sized pages that are 4kb. A larger page size allows for fewer TLB cache misses and thus more efficient use of the CPU. Using a larger page size also results in more memory waste, which can hurt performance in some use cases. THPs are currently enabled in the Linux Kernel by applications in limited virtual address ranges via the madvise system call. The THP shrinker tries to find a balance between increased use of THPs, and increased use of memory. It shrinks the size of memory by removing the underutilized THPs that are identified by the thp_utilization scanner. In our experiments we have noticed that the least utilized THPs are almost entirely unutilized. Sample Output: Utilized[0-50]: 1331 680884 Utilized[51-101]: 9 3983 Utilized[102-152]: 3 1187 Utilized[153-203]: 0 0 Utilized[204-255]: 2 539 Utilized[256-306]: 5 1135 Utilized[307-357]: 1 192 Utilized[358-408]: 0 0 Utilized[409-459]: 1 57 Utilized[460-512]: 400 13 Last Scan Time: 223.98s Last Scan Duration: 70.65s Above is a sample obtained from one of our test machines when THP is always enabled. Of the 1331 THPs in this thp_utilization sample that have from 0-50 utilized subpages, we see that there are 680884 free pages. This comes out to 680884 / (512 * 1331) = 99.91% zero pages in the least utilized bucket. This represents 680884 * 4KB = 2.7GB memory waste. Also note that the vast majority of pages are either in the least utilized [0-50] or most utilized [460-512] buckets. The least utilized THPs are responsible for almost all of the memory waste when THP is always enabled. Thus by clearing out THPs in the lowest utilization bucket we extract most of the improvement in CPU efficiency. We have seen similar results on our production hosts. This patchset introduces the THP shrinker we have developed to identify and split the least utilized THPs. It includes the thp_utilization changes that groups anonymous THPs into buckets, the split_huge_page() changes that identify and zap zero 4KB pages within THPs and the shrinker changes. It should be noted that the split_huge_page() changes are based off previous work done by Yu Zhao. In the future, we intend to allow additional tuning to the shrinker based on workload depending on CPU/IO/Memory pressure and the amount of anonymous memory. The long term goal is to eventually always enable THP for all applications and deprecate madvise entirely. In production we thus far have observed 2-3% reduction in overall cpu usage on stateless web servers when THP is always enabled. Alexander Zhu (5): mm: add thp_utilization metrics to debugfs mm: changes to split_huge_page() to free zero filled tail pages mm: do not remap clean subpages when splitting isolated thp mm: add selftests to split_huge_page() to verify unmap/zap of zero pages mm: THP low utilization shrinker Documentation/admin-guide/mm/transhuge.rst | 9 + include/linux/huge_mm.h | 9 + include/linux/list_lru.h | 24 ++ include/linux/mm_types.h | 5 + include/linux/rmap.h | 2 +- include/linux/vm_event_item.h | 3 + mm/Makefile | 2 +- mm/huge_memory.c | 155 +++++++++++- mm/list_lru.c | 49 ++++ mm/migrate.c | 73 +++++- mm/migrate_device.c | 4 +- mm/page_alloc.c | 6 + mm/thp_utilization.c | 222 ++++++++++++++++++ mm/vmstat.c | 3 + .../selftests/vm/split_huge_page_test.c | 115 ++++++++- tools/testing/selftests/vm/vm_util.c | 23 ++ tools/testing/selftests/vm/vm_util.h | 3 + 17 files changed, 689 insertions(+), 18 deletions(-) create mode 100644 mm/thp_utilization.c -- 2.30.2