On 8/2/22 19:22, Sean Christopherson wrote:
Userspace can already force the ideal setup for eager page splitting by configuring vNUMA-aware memslots and using a task with appropriate policy to toggle dirty logging. And userspace really should be encouraged to do that, because otherwise walking the page tables in software to do the split is going to be constantly accessing remote memory.
Yes, it's possible to locate the page tables on the node that holds the memory they're mapping by enable dirty logging from different tasks for different memslots, but that seems a bit weird.
Walking the page tables in software is going to do several remote memory accesses, but it will do that in a thread that probably is devoted to background tasks anyway. The relative impact of remote memory accesses in the thread that enables dirty logging vs. in the vCPU thread should also be visible in the overall performance of dirty_log_perf_test.
So I agree with Vipin's patch and would even extend it to all page table allocations, however dirty_log_perf_test should be run with fixed CPU mappings to measure accurately the impact of the remote memory accesses.
Paolo