On 1/9/2023 4:37 PM, Kirill A. Shutemov wrote: > On Mon, Jan 09, 2023 at 03:22:28PM +0800, Yin Fengwei wrote: >> In a nutshell: 4k is too small and 2M is too big. We started >> asking ourselves whether there was something in the middle that >> we could do. This series shows what that middle ground might >> look like. It provides some of the benefits of THP while >> eliminating some of the downsides. >> >> This series uses "multiple consecutive pages" (mcpages) of >> between 8K and 2M of base pages for anonymous user space mappings. >> This will lead to less internal fragmentation versus 2M mappings >> and thus less memory consumption and wasted CPU time zeroing >> memory which will never be used. >> >> In the implementation, we allocate high order page with order of >> mcpage (e.g., order 2 for 16KB mcpage). This makes sure the >> physical contiguous memory is used and benefit sequential memory >> access latency. >> >> Then split the high order page. By doing this, the sub-page of >> mcpage is just 4K normal page. The current kernel page >> management is applied to "mc" pages without any changes. Batching >> page faults is allowed with mcpage and reduce page faults number. >> >> There are costs with mcpage. Besides no TLB benefit THP brings, it >> increases memory consumption and latency of allocation page >> comparing to 4K base page. >> >> This series is the first step of mcpage. The furture work can be >> enable mcpage for more components like page cache, swapping etc. >> Finally, most pages in system will be allocated/free/reclaimed >> with mcpage order. > > It doesn't worth adding a new path in page fault handing. We need to make > existing mechanisms more flexible. > > I think it has to be done on top of folios: > > 1. Converts anonymous memory to folios. Only order-9 (HPAGE_PMD_ORDER) and > order-0 at first. > 2. Remove assumption of THP being order-9. > 3. Start allocating THPs <order-9. Thanks a lot for the comments. Really appreciate it. Regards Yin, Fengwei >