Re: [PATCH v2] KVM: selftests: Fix dirty_log_page_splitting_test as page migration

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 26 Jan 2024 12:39:49 -0800

+David

On Mon, Jan 22, 2024, Tao Su wrote:
> In dirty_log_page_splitting_test, vm_get_stat(vm, "pages_4k") has
> probability of gradually reducing before enabling dirty logging. The
> reason is the backing sources of some pages (test code and per-vCPU
> stacks) are not HugeTLB, leading to the possibility of being migrated.
> 
> Requiring NUMA balancing be disabled isn't going to fix the underlying
> issue, it's just guarding against one of the more likely culprits.
> Therefore, precisely validate only the test data pages, i.e. ensure
> no huge pages left and the number of all 4k pages should be at least
> equal to the split pages after splitting.
> 
> Reported-by: Yi Lai <yi1.lai@xxxxxxxxx>
> Signed-off-by: Tao Su <tao1.su@xxxxxxxxxxxxxxx>
> Tested-by: Yi Lai <yi1.lai@xxxxxxxxx>
> ---
> Changelog:
> 
> v2:
>   - Drop the requirement of NUMA balancing
>   - Change the ASSERT conditions
> 
> v1:
>   https://lore.kernel.org/all/20240117064441.2633784-1-tao1.su@xxxxxxxxxxxxxxx/
> ---
>  .../kvm/x86_64/dirty_log_page_splitting_test.c     | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> index 634c6bfcd572..63f9cd2b1e31 100644
> --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> @@ -92,7 +92,7 @@ static void run_test(enum vm_guest_mode mode, void *unused)
>  	uint64_t host_num_pages;
>  	uint64_t pages_per_slot;
>  	int i;
> -	uint64_t total_4k_pages;
> +	uint64_t split_4k_pages;
>  	struct kvm_page_stats stats_populated;
>  	struct kvm_page_stats stats_dirty_logging_enabled;
>  	struct kvm_page_stats stats_dirty_pass[ITERATIONS];
> @@ -166,9 +166,8 @@ static void run_test(enum vm_guest_mode mode, void *unused)
>  	memstress_destroy_vm(vm);
>  
>  	/* Make assertions about the page counts. */
> -	total_4k_pages = stats_populated.pages_4k;
> -	total_4k_pages += stats_populated.pages_2m * 512;
> -	total_4k_pages += stats_populated.pages_1g * 512 * 512;
> +	split_4k_pages = stats_populated.pages_2m * 512;
> +	split_4k_pages += stats_populated.pages_1g * 512 * 512;
>  
>  	/*
>  	 * Check that all huge pages were split. Since large pages can only
> @@ -180,11 +179,13 @@ static void run_test(enum vm_guest_mode mode, void *unused)
>  	 */
>  	if (dirty_log_manual_caps) {
>  		TEST_ASSERT_EQ(stats_clear_pass[0].hugepages, 0);
> -		TEST_ASSERT_EQ(stats_clear_pass[0].pages_4k, total_4k_pages);
> +		TEST_ASSERT(stats_clear_pass[0].pages_4k >= split_4k_pages,
> +			    "The number of 4k pages should be at least equal to the split pages");
>  		TEST_ASSERT_EQ(stats_dirty_logging_enabled.hugepages, stats_populated.hugepages);
>  	} else {
>  		TEST_ASSERT_EQ(stats_dirty_logging_enabled.hugepages, 0);
> -		TEST_ASSERT_EQ(stats_dirty_logging_enabled.pages_4k, total_4k_pages);
> +		TEST_ASSERT(stats_dirty_logging_enabled.pages_4k >= split_4k_pages,
> +			    "The number of 4k pages should be at least equal to the split pages");
>  	}
>  
>  	/*
> @@ -192,7 +193,6 @@ static void run_test(enum vm_guest_mode mode, void *unused)
>  	 * memory again, the page counts should be the same as they were
>  	 * right after initial population of memory.
>  	 */
> -	TEST_ASSERT_EQ(stats_populated.pages_4k, stats_repopulated.pages_4k);
>  	TEST_ASSERT_EQ(stats_populated.pages_2m, stats_repopulated.pages_2m);
>  	TEST_ASSERT_EQ(stats_populated.pages_1g, stats_repopulated.pages_1g);

Isn't it possible that something other than guest data could be mapped by THP
hugepage, and that that hugepage could get shattered between the initial run and
the re-population run?

The test knows (or at least, darn well should know) exactly how much memory is
being dirty logged.  Rather that rely *only* on before/after heuristics, can't
we assert that the _delta_, i.e. the number of hugepages that are split, and then
the number of hugepages that are reconstituted, is greater than or equal to the
size of the memslots being dirty logged?

>  }
> 
> base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
> -- 
> 2.34.1
>