On Thu, Apr 04, 2024 at 06:21:50PM +0100, Matthew Wilcox wrote: > On Thu, Apr 04, 2024 at 10:17:26AM -0700, Lokesh Gidra wrote: > > - folio_move_anon_rmap(src_folio, dst_vma); > > - WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr)); > > - > > src_pmdval = pmdp_huge_clear_flush(src_vma, src_addr, src_pmd); > > /* Folio got pinned from under us. Put it back and fail the move. */ > > if (folio_maybe_dma_pinned(src_folio)) { > > @@ -2270,6 +2267,9 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm > > goto unlock_ptls; > > } > > > > + folio_move_anon_rmap(src_folio, dst_vma); > > + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr)); > > + > > This use of WRITE_ONCE scares me. We hold the folio locked. Why do > we need to use WRITE_ONCE? Who's looking at folio->index without > holding the folio lock? Seems true, but maybe suitable for a separate patch to clean it even so? We also have the other pte level which has the same WRITE_ONCE(), so if we want to drop we may want to drop both. I just got to start reading some the new move codes (Lokesh, apologies on not be able to provide feedbacks previously..), but then I found one thing unclear, on special handling of private file mappings only in userfault context, and I didn't know why: lock_vma(): if (vma) { /* * lock_vma_under_rcu() only checks anon_vma for private * anonymous mappings. But we need to ensure it is assigned in * private file-backed vmas as well. */ if (!(vma->vm_flags & VM_SHARED) && unlikely(!vma->anon_vma)) vma_end_read(vma); else return vma; } AFAIU even for generic users of lock_vma_under_rcu(), anon_vma must be stable to be used. Here it's weird to become an userfault specific operation to me. I was surprised how it worked for private file maps on faults, then I had a check and it seems we postponed such check until vmf_anon_prepare(), which is the CoW path already, so we do as I expected, but seems unnecessary to that point? Would something like below make it much cleaner for us? As I just don't yet see why userfault is special here. Thanks, ===8<=== diff --git a/mm/memory.c b/mm/memory.c index 984b138f85b4..d5cf1d31c671 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3213,10 +3213,8 @@ vm_fault_t vmf_anon_prepare(struct vm_fault *vmf) if (likely(vma->anon_vma)) return 0; - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - vma_end_read(vma); - return VM_FAULT_RETRY; - } + /* We shouldn't try a per-vma fault at all if anon_vma isn't solid */ + WARN_ON_ONCE(vmf->flags & FAULT_FLAG_VMA_LOCK); if (__anon_vma_prepare(vma)) return VM_FAULT_OOM; return 0; @@ -5817,9 +5815,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, * find_mergeable_anon_vma uses adjacent vmas which are not locked. * This check must happen after vma_start_read(); otherwise, a * concurrent mremap() with MREMAP_DONTUNMAP could dissociate the VMA - * from its anon_vma. + * from its anon_vma. This applies to both anon or private file maps. */ - if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) + if (unlikely(!(vma->vm_flags & VM_SHARED) && !vma->anon_vma)) goto inval_end_read; /* Check since vm_start/vm_end might change before we lock the VMA */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index f6267afe65d1..61f21da77dcd 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -72,17 +72,8 @@ static struct vm_area_struct *lock_vma(struct mm_struct *mm, struct vm_area_struct *vma; vma = lock_vma_under_rcu(mm, address); - if (vma) { - /* - * lock_vma_under_rcu() only checks anon_vma for private - * anonymous mappings. But we need to ensure it is assigned in - * private file-backed vmas as well. - */ - if (!(vma->vm_flags & VM_SHARED) && unlikely(!vma->anon_vma)) - vma_end_read(vma); - else - return vma; - } + if (vma) + return vma; mmap_read_lock(mm); vma = find_vma_and_prepare_anon(mm, address); -- 2.44.0 -- Peter Xu