On 8/1/24 15:11, David Hildenbrand wrote:
On 01.08.24 11:38, Dev Jain wrote:
On 8/1/24 14:12, David Hildenbrand wrote:
On 01.08.24 10:16, Dev Jain wrote:
I and Ryan had a discussion and we thought it would be best to get
feedback
from the community.
The migration mm selftest currently fails on arm64 for shared anon
mappings,
due to the following race:
Do you mean MAP_SHARED|MAP_ANON or MAP_PRIVATE|MAP_ANON_fork? Because
you note shmem below, I assume you mean MAP_SHARED|MAP_ANON
Yes.
Migration: Page fault:
try_to_migrate_one(): handle_pte_fault():
1. Nuke the PTE PTE has been deleted =>
do_pte_missing()
2. Mark the PTE for migration PTE has not been deleted
but is just not "present" => do_swap_page()
In filemap_fault_recheck_pte_none() we recheck under PTL to make sure
that a temporary pte_none() really was persistent pte_none() and not a
temporary pte_none() under PTL.
Should we do something similar in do_fault()? I see that we already do
something like that on the "!vma->vm_ops->fault" path.
But of course, there is a tradeoff between letting migration
(temporarily) fail and grabbing the PTL during page faults.
To dampen the tradeoff, we could do this in shmem_fault() instead? But
then, this would mean that we do this in all
kinds of vma->vm_ops->fault, only when we discover another reference
count race condition :) Doing this in do_fault()
should solve this once and for all. In fact, do_pte_missing() may call
do_anonymous_page() or do_fault(), and I just
noticed that the former already checks this using vmf_pte_changed().
What I am still missing is why this is (a) arm64 only; and (b) if this
is something we should really worry about. There are other reasons
(e.g., speculative references) why migration could temporarily fail,
does it happen that often that it is really something we have to worry
about?
(a) See discussion at [1]; I guess it passes on x86, which is quite
strange since the race is clearly arch-independent.
(b) On my machine, on an average in under 10 iterations of move_pages(),
it fails, which seems problematic to
me assuming it is passing on other arches, since 10 iterations means
this is failing very quickly.
[1]:
https://lore.kernel.org/all/9de42ace-dab1-5f60-af8a-26045ada27b9@xxxxxxx/