On 13/11/2023 15:04, Matthew Wilcox wrote: > On Mon, Nov 13, 2023 at 10:19:48AM +0000, Ryan Roberts wrote: >> On 13/11/2023 05:18, Matthew Wilcox wrote: >>> My hope is to abolish the 64kB page size configuration. ie instead of >>> using the mixture of page sizes that you currently are -- 64k and >>> 1M (right? Order-0, and order-4) >> >> Not quite; the contpte-size for a 64K page size is 2M/order-5. (and yes, it is >> 64K/order-4 for a 4K page size, and 2M/order-7 for a 16K page size. I agree that >> intuitively you would expect the order to remain constant, but it doesn't). >> >> The "recommend" setting above will actually enable order-3 as well even though >> there is no HW benefit to this. So the full set of available memory sizes here is: >> >> 64K/order-0, 512K/order-3, 2M/order-5, 512M/order-13 >> >>> , that 4k, 64k and 2MB (order-0, >>> order-4 and order-9) will provide better performance. >>> >>> Have you run any experiements with a 4kB page size? >> >> Agree that would be interesting with 64K small-sized THP enabled. And I'd love >> to get to a world were we universally deal in variable sized chunks of memory, >> aligned on 4K boundaries. >> >> In my experience though, there are still some performance benefits to 64K base >> page vs 4K+contpte; the page tables are more cache efficient for the former case >> - 64K of memory is described by 8 bytes in the former vs 8x16=128 bytes in the >> latter. In practice the HW will still only read 8 bytes in the latter but that's >> taking up a full cache line vs the former where a single cache line stores 8x >> 64K entries. > > This is going to depend on your workload though -- if you're using more > 2MB than 64kB, you get to elide a layer of page table with 4k base, > rather than taking up 4 cache lines with a 64k base. True, but again depending on workload/config, you may have few levels of lookup for the 64K native case in the first place because you consume more VA bits at each level.