Ryan Roberts <ryan.roberts@xxxxxxx> writes: > On 07/08/2023 07:39, Alistair Popple wrote: >> The migration selftest was only checking the return code and not the >> status array for migration success/failure. Update the test to check >> both. This uncovered a bug in the return code handling of >> do_pages_move(). >> >> Also disable NUMA balancing as that can lead to unexpected migration >> failures. >> >> Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> >> Suggested-by: Ryan Roberts <ryan.roberts@xxxxxxx> >> --- >> >> Ryan, this will still cause the test to fail if a migration failed. I >> was unable to reproduce a migration failure for any cases on my system >> once I disabled NUMA balancing though so I'd be curious if you are >> still seeing failures with this patch applied. AFAIK there shouldn't >> be anything else that would be causing migration failure so would like >> to know what is causing failures. Thanks! > > > Hi Alistair, > > Afraid I'm still seeing unmigrated pages when running with these 2 patches: > > > # RUN migration.shared_anon ... > Didn't migrate 1 pages > # migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) == 0 (0) > # shared_anon: Test terminated by assertion > # FAIL migration.shared_anon > not ok 2 migration.shared_anon > > > I added some instrumentation; it usually fails on the second time > through the loop in migrate() but I've also seen it fail the first > time. Never seen it get though 2 iterations successfully though. Interesting. I guess migration failure is always possible for various reasons so I will update the test to report the number of failed migrations rather than making it a test failure. I was mostly just curious as to what would be causing the occasional failures for my own understanding, but the failures themselves are unimportant. > I did also try just this patch without the error handling update in the kernel, but it still fails in the same way. > > I'm running on arm64 in case that wasn't clear. Let me know if there is anything I can do to help debug. Thanks! Unless you're concerned about the failures I am happy to ignore them. Pages can fail to migrate for all sorts of reasons although I'm a little suprised anonymous migrations are failing so frequently for you. > Thanks, > Ryan > > >> >> tools/testing/selftests/mm/migration.c | 18 +++++++++++++++++- >> 1 file changed, 17 insertions(+), 1 deletion(-) >> >> diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c >> index 379581567f27..cf079af5799b 100644 >> --- a/tools/testing/selftests/mm/migration.c >> +++ b/tools/testing/selftests/mm/migration.c >> @@ -51,6 +51,12 @@ FIXTURE_SETUP(migration) >> ASSERT_NE(self->threads, NULL); >> self->pids = malloc(self->nthreads * sizeof(*self->pids)); >> ASSERT_NE(self->pids, NULL); >> + >> + /* >> + * Disable NUMA balancing which can cause migration >> + * failures. >> + */ >> + numa_set_membind(numa_all_nodes_ptr); >> }; >> >> FIXTURE_TEARDOWN(migration) >> @@ -62,13 +68,14 @@ FIXTURE_TEARDOWN(migration) >> int migrate(uint64_t *ptr, int n1, int n2) >> { >> int ret, tmp; >> - int status = 0; >> struct timespec ts1, ts2; >> >> if (clock_gettime(CLOCK_MONOTONIC, &ts1)) >> return -1; >> >> while (1) { >> + int status = NUMA_NUM_NODES + 1; >> + >> if (clock_gettime(CLOCK_MONOTONIC, &ts2)) >> return -1; >> >> @@ -85,6 +92,15 @@ int migrate(uint64_t *ptr, int n1, int n2) >> return -2; >> } >> >> + /* >> + * Note we should never see this because move_pages() should >> + * have indicated a page couldn't migrate above. >> + */ >> + if (status < 0) { >> + printf("Page didn't migrate, error %d\n", status); >> + return -2; >> + } >> + >> tmp = n2; >> n2 = n1; >> n1 = tmp;