Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/08/2023 07:39, Alistair Popple wrote:
> The migration selftest was only checking the return code and not the
> status array for migration success/failure. Update the test to check
> both. This uncovered a bug in the return code handling of
> do_pages_move().
> 
> Also disable NUMA balancing as that can lead to unexpected migration
> failures.
> 
> Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx>
> Suggested-by: Ryan Roberts <ryan.roberts@xxxxxxx>
> ---
> 
> Ryan, this will still cause the test to fail if a migration failed. I
> was unable to reproduce a migration failure for any cases on my system
> once I disabled NUMA balancing though so I'd be curious if you are
> still seeing failures with this patch applied. AFAIK there shouldn't
> be anything else that would be causing migration failure so would like
> to know what is causing failures. Thanks!


Hi Alistair,

Afraid I'm still seeing unmigrated pages when running with these 2 patches:


#  RUN           migration.shared_anon ...
Didn't migrate 1 pages
# migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) == 0 (0)
# shared_anon: Test terminated by assertion
#          FAIL  migration.shared_anon
not ok 2 migration.shared_anon


I added some instrumentation; it usually fails on the second time through the loop in migrate() but I've also seen it fail the first time. Never seen it get though 2 iterations successfully though.

I did also try just this patch without the error handling update in the kernel, but it still fails in the same way.

I'm running on arm64 in case that wasn't clear. Let me know if there is anything I can do to help debug.

Thanks,
Ryan


> 
>  tools/testing/selftests/mm/migration.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
> index 379581567f27..cf079af5799b 100644
> --- a/tools/testing/selftests/mm/migration.c
> +++ b/tools/testing/selftests/mm/migration.c
> @@ -51,6 +51,12 @@ FIXTURE_SETUP(migration)
>  	ASSERT_NE(self->threads, NULL);
>  	self->pids = malloc(self->nthreads * sizeof(*self->pids));
>  	ASSERT_NE(self->pids, NULL);
> +
> +	/*
> +	 * Disable NUMA balancing which can cause migration
> +	 * failures.
> +	 */
> +	numa_set_membind(numa_all_nodes_ptr);
>  };
>  
>  FIXTURE_TEARDOWN(migration)
> @@ -62,13 +68,14 @@ FIXTURE_TEARDOWN(migration)
>  int migrate(uint64_t *ptr, int n1, int n2)
>  {
>  	int ret, tmp;
> -	int status = 0;
>  	struct timespec ts1, ts2;
>  
>  	if (clock_gettime(CLOCK_MONOTONIC, &ts1))
>  		return -1;
>  
>  	while (1) {
> +		int status = NUMA_NUM_NODES + 1;
> +
>  		if (clock_gettime(CLOCK_MONOTONIC, &ts2))
>  			return -1;
>  
> @@ -85,6 +92,15 @@ int migrate(uint64_t *ptr, int n1, int n2)
>  			return -2;
>  		}
>  
> +		/*
> +		 * Note we should never see this because move_pages() should
> +		 * have indicated a page couldn't migrate above.
> +		 */
> +		if (status < 0) {
> +			printf("Page didn't migrate, error %d\n", status);
> +			return -2;
> +		}
> +
>  		tmp = n2;
>  		n2 = n1;
>  		n1 = tmp;





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux