Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09.08.23 06:21, Alistair Popple wrote:

Ryan Roberts <ryan.roberts@xxxxxxx> writes:

On 07/08/2023 13:41, Alistair Popple wrote:

Ryan Roberts <ryan.roberts@xxxxxxx> writes:

On 07/08/2023 07:39, Alistair Popple wrote:
The migration selftest was only checking the return code and not the
status array for migration success/failure. Update the test to check
both. This uncovered a bug in the return code handling of
do_pages_move().

Also disable NUMA balancing as that can lead to unexpected migration
failures.

Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx>
Suggested-by: Ryan Roberts <ryan.roberts@xxxxxxx>
---

Ryan, this will still cause the test to fail if a migration failed. I
was unable to reproduce a migration failure for any cases on my system
once I disabled NUMA balancing though so I'd be curious if you are
still seeing failures with this patch applied. AFAIK there shouldn't
be anything else that would be causing migration failure so would like
to know what is causing failures. Thanks!


Hi Alistair,

Afraid I'm still seeing unmigrated pages when running with these 2 patches:


#  RUN           migration.shared_anon ...
Didn't migrate 1 pages
# migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) == 0 (0)
# shared_anon: Test terminated by assertion
#          FAIL  migration.shared_anon
not ok 2 migration.shared_anon


I added some instrumentation; it usually fails on the second time
through the loop in migrate() but I've also seen it fail the first
time. Never seen it get though 2 iterations successfully though.

Interesting. I guess migration failure is always possible for various
reasons so I will update the test to report the number of failed
migrations rather than making it a test failure.

I find it odd that migration always succeeds for the private_anon and
private_anon_thp cases, but fails for the fork case. I guess there is something
about the page being RO (for CoW) or having higher mapcount/refcount that causes
the migration to fail?

But the fork case uses a shared mapping, so there shouldn't be any
RO/CoW AFAIK. A higher refcount than expected would cause migration
failure, but there's nothing I can think of that would be causing that
now NUMA balancing is disabled in the test (that caused migration
failures for me in the private cases due to the concurrent migration
attempts).

Yes, we do have MAP_SHARED | MAP_ANONYMOUS, which is just shmem with extra-steps.

In add_page_for_migration(), we do have

	if (page_mapcount(page) > 1 && !migrate_all)

but the test seems to always specify MPOL_MF_MOVE_ALL when calling move_pages(), so it should be ignored.

It would be helpful to dump the page when migration fails in that case.


Note that in add_page_for_migration(), we perform a follow_page() that used to accidentally honor NUMA hinting faults. Fixed by

commit 3c471b04db7604974e3bea45e2d9c7c27a905536
Author: David Hildenbrand <david@xxxxxxxxxx>
Date:   Thu Aug 3 16:32:02 2023 +0200

    mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT

in mm-unstable.


Maybe that fix does no longer require you to disable NUMA hinting for this test case. Maybe you're lucky and it resolves the shared_anon case as well, but I somewhat doubt it :)

It doesn't really explain why the shared case would fail, though...

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux