Re: Race condition observed between page migration and page fault handling on arm64 machines

David Hildenbrand <david@xxxxxxxxxx> · Thu, 1 Aug 2024 15:48:22 +0200

On 01.08.24 15:43, Will Deacon wrote:
On Thu, Aug 01, 2024 at 03:26:57PM +0200, David Hildenbrand wrote:
On 01.08.24 15:13, David Hildenbrand wrote:
To dampen the tradeoff, we could do this in shmem_fault() instead? But
then, this would mean that we do this in all

kinds of vma->vm_ops->fault, only when we discover another reference
count race condition :) Doing this in do_fault()

should solve this once and for all. In fact, do_pte_missing() may call
do_anonymous_page() or do_fault(), and I just

noticed that the former already checks this using vmf_pte_changed().

What I am still missing is why this is (a) arm64 only; and (b) if this
is something we should really worry about. There are other reasons
(e.g., speculative references) why migration could temporarily fail,
does it happen that often that it is really something we have to worry
about?


(a) See discussion at [1]; I guess it passes on x86, which is quite
strange since the race is clearly arch-independent.

Yes, I think this is what we have to understand. Is the race simply less
likely to trigger on x86?

I would assume that it would trigger on any arch.

I just ran it on a x86 VM with 2 NUMA nodes and it also seems to work here.

Is this maybe related to deferred flushing? Such that the other CPU will
by accident just observe the !pte_none a little less likely?

But arm64 also usually defers flushes, right? At least unless
ARM64_WORKAROUND_REPEAT_TLBI is around. With that we never do deferred
flushes.

Bingo!

diff --git a/mm/rmap.c b/mm/rmap.c
index e51ed44f8b53..ce94b810586b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -718,10 +718,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct
*mm, pte_t pteval,
   */
  static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
  {
-       if (!(flags & TTU_BATCH_FLUSH))
-               return false;
-
-       return arch_tlbbatch_should_defer(mm);
+       return false;
  }


On x86:

# ./migration
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
#  RUN           migration.shared_anon ...
Didn't migrate 1 pages
# migration.c:170:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2)
== 0 (0)
# shared_anon: Test terminated by assertion
#          FAIL  migration.shared_anon
not ok 1 migration.shared_anon


It fails all of the time!

Nice work! I suppose that makes sense as, with the eager TLB
invalidation, the window between the other CPU faulting and the
migration entry being written is fairly wide.

Not sure about a fix though :/ It feels a bit overkill to add a new
invalid pte encoding just for this.

Something like that might make the test happy in most cases:

diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
index 6908569ef406..4c18bfc13b94 100644
--- a/tools/testing/selftests/mm/migration.c
+++ b/tools/testing/selftests/mm/migration.c
@@ -65,6 +65,7 @@ int migrate(uint64_t *ptr, int n1, int n2)
        int ret, tmp;
        int status = 0;
        struct timespec ts1, ts2;
+       int errors = 0;
 
        if (clock_gettime(CLOCK_MONOTONIC, &ts1))
                return -1;
@@ -79,12 +80,17 @@ int migrate(uint64_t *ptr, int n1, int n2)
                ret = move_pages(0, 1, (void **) &ptr, &n2, &status,
                                MPOL_MF_MOVE_ALL);
                if (ret) {
-                       if (ret > 0)
+                       if (ret > 0) {
+                               if (++errors < 100)
+                                       continue;
                                printf("Didn't migrate %d pages\n", ret);
-                       else
+                       } else {
                                perror("Couldn't migrate pages");
+                       }
                        return -2;
                }
+               /* Progress! */
+               errors = 0;
 
                tmp = n2;
                n2 = n1;


[root@localhost mm]# ./migration
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
#  RUN           migration.shared_anon ...
#            OK  migration.shared_anon
ok 1 migration.shared_anon
# PASSED: 1 / 1 tests passed.
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0


--
Cheers,

David / dhildenb