[RFC PATCH 1/4] mcpage: add size/mask/shift definition for multiple consecutive page

Yin Fengwei <fengwei.yin@xxxxxxxxx> · Mon, 9 Jan 2023 15:22:29 +0800

Huge page in current kernel could bring obvious performance improvement
for some workloads with less TLB missing and less page fault. But the
limited options of huge page size (2M/1G for x86_64) also brings extra
cost like larger memory consumption, and more CPU cycle for page zeroing.

The idea of the multiple consecutive page (abbr as "mcpage") is using
collection of physical contiguous 4K page other than huge page for
anonymous mapping. Target is to have more choices to trade off the pros
and cons of huge page. Comparing to huge page, it will not get so much
benefit of TLB missing and page fault. And it will not pay too much extra
cost for large memory consumption and larger latency introduced by page
compaction, page zeroing etc.

The size of mcpage can be configured. The default value of 16K size is
just picked up arbitrarily. User should choose the value according to the
result of tuning their workload with different mcpage size.

To have physical contiguous pages, high order pages is allocated (order
is calculated according to mcpage size). Then the high order page will
be split. By doing this, each sub page of mcpage is just normal 4K page.
The current kernel page management infrastructure is applied to "mc"
pages without any change.

To reduce the page fault number, multiple page table entries are populated
in one page fault with sub pages pfn of mcpage. This also brings a little
bit cost of memory consumption.

Update Kconfig to allow user define the mcpage order. Define MACROs like
mcpage mask/shift/nr/size.

In this RFC patch, only Kconfig is used for mcpage order to show the idea.
Runtime parameter will be chosen if make this official patch in the future.

Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx>
---
 include/linux/mm_types.h | 11 +++++++++++
 mm/Kconfig               | 19 +++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3b8475007734..fa561c7b6290 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -71,6 +71,17 @@ struct mem_cgroup;
 #define _struct_page_alignment	__aligned(sizeof(unsigned long))
 #endif
 
+#ifdef CONFIG_MCPAGE_ORDER
+#define MCPAGE_ORDER		CONFIG_MCPAGE_ORDER
+#else
+#define MCPAGE_ORDER		0
+#endif
+
+#define MCPAGE_SIZE		(1 << (MCPAGE_ORDER + PAGE_SHIFT))
+#define MCPAGE_MASK		(~(MCPAGE_SIZE - 1))
+#define MCPAGE_SHIFT		(MCPAGE_ORDER + PAGE_SHIFT)
+#define MCPAGE_NR		(1 << (MCPAGE_ORDER))
+
 struct page {
 	unsigned long flags;		/* Atomic flags, some possibly
 					 * updated asynchronously */
diff --git a/mm/Kconfig b/mm/Kconfig
index ff7b209dec05..c202dc99ab6d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -650,6 +650,25 @@ config HUGETLB_PAGE_SIZE_VARIABLE
 	  Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
 	  clamped down to MAX_ORDER - 1.
 
+config MCPAGE
+	bool "multiple consecutive page <mcpage>"
+	default n
+	help
+	  Enable multiple consecutive page: mcpage is page collections (sub-page)
+	  which are physical contiguous. When mapping to user space, all the
+	  sub-pages will be mapped to user space in one page fault handler.
+	  Expect to trade off the pros and cons of huge page. Like less
+	  unnecessary extra memory zeroing and less memory consumption.
+	  But with no TLB benefit.
+
+config MCPAGE_ORDER
+	int "multiple consecutive page order"
+	default 2
+	depends on X86_64 && MCPAGE
+	help
+	  The order of mcpage. Should be chosen carefully by tuning your
+	  workload.
+
 config CONTIG_ALLOC
 	def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
 
-- 
2.30.2