On Tue, Apr 18, 2023 at 04:21:13PM +0200, David Hildenbrand wrote: > Staring at the comment "Recheck VMA as permissions can change since > migration started" in remove_migration_pte() can result in confusion, > because if the source PTE/PMD indicates write permissions, then there > should be no need to check VMA write permissions when restoring migration > entries or PTE-mapping a PMD. > > Commit d3cb8bf6081b ("mm: migrate: Close race between migration completion > and mprotect") introduced the maybe_mkwrite() handling in > remove_migration_pte() in 2014, stating that a race between mprotect() and > migration finishing would be possible, and that we could end up with > a writable PTE that should be readable. > > However, mprotect() code first updates vma->vm_flags / vma->vm_page_prot > and then walks the page tables to (a) set all present writable PTEs to > read-only and (b) convert all writable migration entries to readable > migration entries. While walking the page tables and modifying the > entries, migration code has to grab the PT locks to synchronize against > concurrent page table modifications. Makes sense to me. > > Assuming migration would find a writable migration entry (while holding > the PT lock) and replace it with a writable present PTE, surely mprotect() > code didn't stumble over the writable migration entry yet (converting it > into a readable migration entry) and would instead wait for the PT lock to > convert the now present writable PTE into a read-only PTE. As mprotect() > didn't finish yet, the behavior is just like migration didn't happen: a > writable PTE will be converted to a read-only PTE. > > So it's fine to rely on the writability information in the source > PTE/PMD and not recheck against the VMA as long as we're holding the PT > lock to synchronize with anyone who concurrently wants to downgrade write > permissions (like mprotect()) by first adjusting vma->vm_flags / > vma->vm_page_prot to then walk over the page tables to adjust the page > table entries. > > Running test cases that should reveal such races -- mprotect(PROT_READ) > racing with page migration or THP splitting -- for multiple hours did > not reveal an issue with this cleanup. > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > Cc: Peter Xu <peterx@xxxxxxxxxx> > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > --- > > This is a follow-up cleanup to [1]: > [PATCH v1 RESEND 0/6] mm: (pte|pmd)_mkdirty() should not > unconditionally allow for write access > > I wanted to be a bit careful and write some test cases to convince myself > that I am not missing something important. Of course, there is still the > possibility that my test cases are buggy ;) > > Test cases I'm running: > https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_migration.c > https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_thp_split.c > > > [1] https://lkml.kernel.org/r/20230411142512.438404-1-david@xxxxxxxxxx > > --- > mm/huge_memory.c | 4 ++-- > mm/migrate.c | 5 +---- > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index c23fa39dec92..624671aaa60d 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2234,7 +2234,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > } else { > entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot)); > if (write) > - entry = maybe_mkwrite(entry, vma); > + entry = pte_mkwrite(entry); This is another change besides page migration. I also don't know why it's needed, but it's there since day 1 of thp split in eef1b3ba053, so maybe worthwhile to copy Kirill too (which I did). > if (anon_exclusive) > SetPageAnonExclusive(page + i); > if (!young) > @@ -3271,7 +3271,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) > if (pmd_swp_soft_dirty(*pvmw->pmd)) > pmde = pmd_mksoft_dirty(pmde); > if (is_writable_migration_entry(entry)) > - pmde = maybe_pmd_mkwrite(pmde, vma); > + pmde = pmd_mkwrite(pmde); > if (pmd_swp_uffd_wp(*pvmw->pmd)) > pmde = pmd_mkuffd_wp(pmde); > if (!is_migration_entry_young(entry)) > diff --git a/mm/migrate.c b/mm/migrate.c > index 5d95e09b1618..02cace7955d4 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -213,16 +213,13 @@ static bool remove_migration_pte(struct folio *folio, > if (pte_swp_soft_dirty(*pvmw.pte)) > pte = pte_mksoft_dirty(pte); > > - /* > - * Recheck VMA as permissions can change since migration started > - */ > entry = pte_to_swp_entry(*pvmw.pte); > if (!is_migration_entry_young(entry)) > pte = pte_mkold(pte); > if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) > pte = pte_mkdirty(pte); > if (is_writable_migration_entry(entry)) > - pte = maybe_mkwrite(pte, vma); > + pte = pte_mkwrite(pte); > else if (pte_swp_uffd_wp(*pvmw.pte)) > pte = pte_mkuffd_wp(pte); > > -- > 2.39.2 > -- Peter Xu