On Sun, Nov 12, 2023 at 10:57:47PM -0500, John Hubbard wrote: > I've done some initial performance testing of this patchset on an arm64 > SBSA server. When these patches are combined with the arm64 arch contpte > patches in Ryan's git tree (he has conveniently combined everything > here: [1]), we are seeing a remarkable, consistent speedup of 10.5x on > some memory-intensive workloads. Many test runs, conducted independently > by different engineers and on different machines, have convinced me and > my colleagues that this is an accurate result. > > In order to achieve that result, we used the git tree in [1] with > following settings: > > echo always >/sys/kernel/mm/transparent_hugepage/enabled > echo recommend >/sys/kernel/mm/transparent_hugepage/anon_orders > > This was on a aarch64 machine configure to use a 64KB base page size. > That configuration means that the PMD size is 512MB, which is of course > too large for practical use as a pure PMD-THP. However, with with these > small-size (less than PMD-sized) THPs, we get the improvements in TLB > coverage, while still getting pages that are small enough to be > effectively usable. That is quite remarkable! My hope is to abolish the 64kB page size configuration. ie instead of using the mixture of page sizes that you currently are -- 64k and 1M (right? Order-0, and order-4), that 4k, 64k and 2MB (order-0, order-4 and order-9) will provide better performance. Have you run any experiements with a 4kB page size?