On 11/22/23 08:29, Ryan Roberts wrote: ...
Prerequisites ============= Some work items identified as being prerequisites are listed on page 3 at [8]. The summary is: | item | status | |:------------------------------|:------------------------| | mlock | In mainline (v6.7) | | madvise | In mainline (v6.6) | | compaction | v1 posted [9] | | numa balancing | Investigated: see below | | user-triggered page migration | In mainline (v6.7) | | khugepaged collapse | In mainline (NOP) | On NUMA balancing, which currently ignores any PTE-mapped THPs it encounters, John Hubbard has investigated this and concluded that it is A) not clear at the moment what a better policy might be for PTE-mapped THP and B) questions whether this should really be considered a prerequisite given no regression is caused for the default "small-sized THP disabled" case, and there is no correctness issue when it is enabled - its just a potential for non-optimal performance. (John please do elaborate if I haven't captured this correctly!)
That's accurate. I actually want to continue looking into this (Mel Gorman's recent replies to v6 provided helpful touchstones to the NUMA reasoning leading up to the present day), and maybe at least bring pte-thps into rough parity with THPs with respect to NUMA. But that really doesn't seem like something that needs to happen first, especially since the outcome might even be, "first, do no harm"--as in, it's better as-is. We'll see.
If there are no disagreements about removing numa balancing from the list, then that just leaves compaction which is in review on list at the moment. I really would like to get this series (and its remaining comapction prerequisite) in for v6.8. I accept that it may be a bit optimistic at this point, but lets see where we get to with review? Testing ======= The series includes patches for mm selftests to enlighten the cow and khugepaged tests to explicitly test with small-order THP, in the same way that PMD-order THP is tested. The new tests all pass, and no regressions are observed in the mm selftest suite. I've also run my usual kernel compilation and java script benchmarks without any issues. Refer to my performance numbers posted with v6 [6]. (These are for small-sized THP only - they do not include the arm64 contpte follow-on series). John Hubbard at Nvidia has indicated dramatic 10x performance improvements for some workloads at [10]. (Observed using v6 of this series as well as the arm64 contpte series).
Testing continues. Some workloads do even much better than than 10x, it's quite remarkable and glorious to see. :) I can send more perf data perhaps in a few days or a week, if there is still doubt about the benefits. That was with the v6 series, though. I'm about to set up and run with v7, and expect to provide a tested by tag for functionality, sometime soon (in the next few days), if machine availability works out as expected. thanks, -- John Hubbard NVIDIA