On 09.08.23 06:21, Alistair Popple wrote:
Ryan Roberts <ryan.roberts@xxxxxxx> writes:
On 07/08/2023 13:41, Alistair Popple wrote:
Ryan Roberts <ryan.roberts@xxxxxxx> writes:
On 07/08/2023 07:39, Alistair Popple wrote:
The migration selftest was only checking the return code and not the
status array for migration success/failure. Update the test to check
both. This uncovered a bug in the return code handling of
do_pages_move().
Also disable NUMA balancing as that can lead to unexpected migration
failures.
Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx>
Suggested-by: Ryan Roberts <ryan.roberts@xxxxxxx>
---
Ryan, this will still cause the test to fail if a migration failed. I
was unable to reproduce a migration failure for any cases on my system
once I disabled NUMA balancing though so I'd be curious if you are
still seeing failures with this patch applied. AFAIK there shouldn't
be anything else that would be causing migration failure so would like
to know what is causing failures. Thanks!
Hi Alistair,
Afraid I'm still seeing unmigrated pages when running with these 2 patches:
# RUN migration.shared_anon ...
Didn't migrate 1 pages
# migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) == 0 (0)
# shared_anon: Test terminated by assertion
# FAIL migration.shared_anon
not ok 2 migration.shared_anon
I added some instrumentation; it usually fails on the second time
through the loop in migrate() but I've also seen it fail the first
time. Never seen it get though 2 iterations successfully though.
Interesting. I guess migration failure is always possible for various
reasons so I will update the test to report the number of failed
migrations rather than making it a test failure.
I find it odd that migration always succeeds for the private_anon and
private_anon_thp cases, but fails for the fork case. I guess there is something
about the page being RO (for CoW) or having higher mapcount/refcount that causes
the migration to fail?
But the fork case uses a shared mapping, so there shouldn't be any
RO/CoW AFAIK. A higher refcount than expected would cause migration
failure, but there's nothing I can think of that would be causing that
now NUMA balancing is disabled in the test (that caused migration
failures for me in the private cases due to the concurrent migration
attempts).
Yes, we do have MAP_SHARED | MAP_ANONYMOUS, which is just shmem with
extra-steps.
In add_page_for_migration(), we do have
if (page_mapcount(page) > 1 && !migrate_all)
but the test seems to always specify MPOL_MF_MOVE_ALL when calling
move_pages(), so it should be ignored.
It would be helpful to dump the page when migration fails in that case.
Note that in add_page_for_migration(), we perform a follow_page() that
used to accidentally honor NUMA hinting faults. Fixed by
commit 3c471b04db7604974e3bea45e2d9c7c27a905536
Author: David Hildenbrand <david@xxxxxxxxxx>
Date: Thu Aug 3 16:32:02 2023 +0200
mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
in mm-unstable.
Maybe that fix does no longer require you to disable NUMA hinting for
this test case. Maybe you're lucky and it resolves the shared_anon case
as well, but I somewhat doubt it :)
It doesn't really explain why the shared case would fail, though...
--
Cheers,
David / dhildenb