* Lokesh Gidra <lokeshgidra@xxxxxxxxxx> [240209 15:59]: > On Fri, Feb 9, 2024 at 11:31 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: ... > > > > > > > +static struct vm_area_struct *lock_vma(struct mm_struct *mm, > > > > > > > + unsigned long address, > > > > > > > + bool prepare_anon) > > > > > > > +{ > > > > > > > + struct vm_area_struct *vma; > > > > > > > + > > > > > > > + vma = lock_vma_under_rcu(mm, address); > > > > > > > + if (vma) { > > > > > > > + /* > > > > > > > + * lock_vma_under_rcu() only checks anon_vma for private > > > > > > > + * anonymous mappings. But we need to ensure it is assigned in > > > > > > > + * private file-backed vmas as well. > > > > > > > + */ > > > > > > > + if (prepare_anon && !(vma->vm_flags & VM_SHARED) && > > > > > > > + !vma->anon_vma) > > > > > > > + vma_end_read(vma); > > > > > > > + else > > > > > > > + return vma; > > > > > > > + } > > > > > > > + > > > > > > > + mmap_read_lock(mm); > > > > > > > + vma = vma_lookup(mm, address); > > > > > > > + if (vma) { > > > > > > > + if (prepare_anon && !(vma->vm_flags & VM_SHARED) && > > > > > > > + anon_vma_prepare(vma)) { > > > > > > > + vma = ERR_PTR(-ENOMEM); > > > > > > > + } else { > > > > > > > + /* > > > > > > > + * We cannot use vma_start_read() as it may fail due to > > > > > > > + * false locked (see comment in vma_start_read()). We > > > > > > > + * can avoid that by directly locking vm_lock under > > > > > > > + * mmap_lock, which guarantees that nobody can lock the > > > > > > > + * vma for write (vma_start_write()) under us. > > > > > > > + */ > > > > > > > + down_read(&vma->vm_lock->lock); > > > > > > > + } > > > > > > > + } > > > > > > > + > > > > > > > + mmap_read_unlock(mm); > > > > > > > + return vma; > > > > > > > +} > > > > > > > + > > > > > > > +static void unlock_vma(struct vm_area_struct *vma) > > > > > > > +{ > > > > > > > + vma_end_read(vma); > > > > > > > +} > > > > > > > + ... > > The current implementation has a deadlock problem: > > thread 1 > thread 2 > > vma_start_read(dst_vma) > > > mmap_write_lock() > > vma_start_write(src_vma) > vma_start_read(src_vma) fails > mmap_read_lock() blocks > > > vma_start_write(dst_vma) > blocks > > > I think the solution is to implement it this way: > > long find_and_lock_vmas(...) > { > dst_vma = lock_vma(mm, dst_start, true); > if (IS_ERR(dst_vma)) > return PTR_ERR(vma); > > src_vma = lock_vma_under_rcu(mm, src_start); > if (!src_vma) { > long err; > if (mmap_read_trylock(mm)) { > src_vma = vma_lookup(mm, src_start); > if (src_vma) { > > down_read(&src_vma->vm_lock->lock); > err = 0; > } else { > err = -ENOENT; > } > mmap_read_unlock(mm); > } else { > vma_end_read(dst_vma); > err = lock_mm_and_find_vmas(...); > if (!err) { Right now lock_mm_and_find_vmas() doesn't use the per-vma locking, so you'd have to lock those here too. I'm sure you realise that, but this is very convoluted. > mmap_read_unlock(mm); > } > } > return err; On contention you will now abort vs block. > } > return 0; > } > > Of course this would need defining lock_mm_and_find_vmas() regardless > of CONFIG_PER_VMA_LOCK. I can also remove the prepare_anon condition > in lock_vma(). You are adding a lot of complexity for a relatively rare case, which is probably not worth optimising. I think you'd be better served by something like : if (likely(src_vma)) return 0; /* Undo any locking */ vma_end_read(dst_vma); /* Fall back to locking both in mmap_lock critical section */ mmap_read_lock(); /* * probably worth creating an inline function for a single vma * find/populate that can be used in lock_vma() that does the anon vma * population? */ dst_vm = lock_vma_populate_anon(); ... src_vma = lock_vma_under_rcu(mm, src_start); ... mmap_read_unlock(); return 0; src_failed: vma_end_read(dst_vma); dst_failed: mmap_read_unlock(); return err; Thanks, Liam ...