On Mon, Jan 09, 2023 at 03:22:28PM +0800, Yin Fengwei wrote: > In a nutshell: 4k is too small and 2M is too big. We started > asking ourselves whether there was something in the middle that > we could do. This series shows what that middle ground might > look like. It provides some of the benefits of THP while > eliminating some of the downsides. > > This series uses "multiple consecutive pages" (mcpages) of > between 8K and 2M of base pages for anonymous user space mappings. > This will lead to less internal fragmentation versus 2M mappings > and thus less memory consumption and wasted CPU time zeroing > memory which will never be used. > > In the implementation, we allocate high order page with order of > mcpage (e.g., order 2 for 16KB mcpage). This makes sure the > physical contiguous memory is used and benefit sequential memory > access latency. > > Then split the high order page. By doing this, the sub-page of > mcpage is just 4K normal page. The current kernel page > management is applied to "mc" pages without any changes. Batching > page faults is allowed with mcpage and reduce page faults number. > > There are costs with mcpage. Besides no TLB benefit THP brings, it > increases memory consumption and latency of allocation page > comparing to 4K base page. > > This series is the first step of mcpage. The furture work can be > enable mcpage for more components like page cache, swapping etc. > Finally, most pages in system will be allocated/free/reclaimed > with mcpage order. It doesn't worth adding a new path in page fault handing. We need to make existing mechanisms more flexible. I think it has to be done on top of folios: 1. Converts anonymous memory to folios. Only order-9 (HPAGE_PMD_ORDER) and order-0 at first. 2. Remove assumption of THP being order-9. 3. Start allocating THPs <order-9. -- Kiryl Shutsemau / Kirill A. Shutemov