On Fri, Apr 15, 2022 at 09:58:44PM +0000, Oliver Upton wrote: [...] > > Smoke tested with KVM selftests + kvm_page_table_test w/ 2M hugetlb to > exercise the table pruning code. Haven't done anything beyond this, > sending as an RFC now to get eyes on the code. Ok, got around to testing this thing a bit harder. Keep in mind that permission faults at PAGE_SIZE granularity already go on the read side of the lock. I used the dirty_log_perf_test with 4G/vCPU and anonymous THP all the way up to 48 vCPUs. Here is the data as it compares to 5.18-rc2. Dirty log time (split 2M -> 4K): +-------+----------+-------------------+ | vCPUs | 5.18-rc2 | 5.18-rc2 + series | +-------+----------+-------------------+ | 1 | 0.83s | 0.85s | | 2 | 0.95s | 1.07s | | 4 | 2.65s | 1.13s | | 8 | 4.88s | 1.33s | | 16 | 9.71s | 1.73s | | 32 | 20.43s | 3.99s | | 48 | 29.15s | 6.28s | +-------+----------+-------------------+ The scaling of prefaulting pass looks better too (same config): +-------+----------+-------------------+ | vCPUs | 5.18-rc2 | 5.18-rc2 + series | +-------+----------+-------------------+ | 1 | 0.42s | 0.18s | | 2 | 0.55s | 0.19s | | 4 | 0.79s | 0.27s | | 8 | 1.29s | 0.35s | | 16 | 2.03s | 0.53s | | 32 | 4.03s | 1.01s | | 48 | 6.10s | 1.51s | +-------+----------+-------------------+ -- Thanks, Oliver