Re: [PATCH] KVM: selftests: Add a requirement for disabling numa balancing

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 17 Jan 2024 07:12:08 -0800

On Wed, Jan 17, 2024, Tao Su wrote:
> In dirty_log_page_splitting_test, vm_get_stat(vm, "pages_4k") has
> probability of gradually reducing to 0 after vm exit. The reason is that
> the host triggers numa balancing and unmaps the related spte. So, the
> number of pages currently mapped in EPT (kvm->stat.pages) is not equal
> to the pages touched by the guest, which causes stats_populated.pages_4k
> and stats_repopulated.pages_4k in this test are not same, resulting in
> failure.

...

> dirty_log_page_splitting_test assumes that kvm->stat.pages and the pages
> touched by the guest are the same, but the assumption is no longer true
> if numa balancing is enabled. Add a requirement for disabling
> numa_balancing to avoid confusing due to test failure.
> 
> Actually, all page migration (including numa balancing) will trigger this
> issue, e.g. running script:
> 	./x86_64/dirty_log_page_splitting_test &
> 	PID=$!
> 	sleep 1
> 	migratepages $PID 0 1
> It is unusual to create above test environment intentionally, but numa
> balancing initiated by the kernel will most likely be triggered, at
> least in dirty_log_page_splitting_test.
> 
> Reported-by: Yi Lai <yi1.lai@xxxxxxxxx>
> Signed-off-by: Tao Su <tao1.su@xxxxxxxxxxxxxxx>
> Tested-by: Yi Lai <yi1.lai@xxxxxxxxx>
> ---
>  .../kvm/x86_64/dirty_log_page_splitting_test.c        | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> index 634c6bfcd572..f2c796111d83 100644
> --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> @@ -212,10 +212,21 @@ static void help(char *name)
>  
>  int main(int argc, char *argv[])
>  {
> +	FILE *f;
>  	int opt;
> +	int ret, numa_balancing;
>  
>  	TEST_REQUIRE(get_kvm_param_bool("eager_page_split"));
>  	TEST_REQUIRE(get_kvm_param_bool("tdp_mmu"));
> +	f = fopen("/proc/sys/kernel/numa_balancing", "r");
> +	if (f) {
> +		ret = fscanf(f, "%d", &numa_balancing);
> +		TEST_ASSERT(ret == 1, "Error reading numa_balancing");
> +		TEST_ASSERT(!numa_balancing, "please run "
> +			    "'echo 0 > /proc/sys/kernel/numa_balancing'");

If we go this route, this should be a TEST_REQUIRE(), not a TEST_ASSERT().  The
test hasn't failed, rather it has detected an incompatible setup.

Something isn't right though.  The test defaults to HugeTLB, and the invocation
in the changelog doesn't override the backing source.  That suggests that NUMA
auto-balancing is zapping HugeTLB VMAs, which AFAIK shouldn't happen, e.g. this
code in task_numa_work() should cause such VMAs to be skipped:

		if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
			is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
			trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE);
			continue;
		}

And the test already warns the user if they opt to use something other than
HugeTLB.

	if (!is_backing_src_hugetlb(backing_src)) {
		pr_info("This test will only work reliably with HugeTLB memory. "
			"It can work with THP, but that is best effort.\n");
	}

If the test is defaulting to something other than HugeTLB, then we should fix
that in the test.  If the kernel is doing NUMA balancing on HugeTLB VMAs, then
we should fix that in the kernel.