On Mon, Feb 12, 2024 at 7:20 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: > > * Lokesh Gidra <lokeshgidra@xxxxxxxxxx> [240209 15:59]: > > On Fri, Feb 9, 2024 at 11:31 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: > ... > > > > > > > > > +static struct vm_area_struct *lock_vma(struct mm_struct *mm, > > > > > > > > + unsigned long address, > > > > > > > > + bool prepare_anon) > > > > > > > > +{ > > > > > > > > + struct vm_area_struct *vma; > > > > > > > > + > > > > > > > > + vma = lock_vma_under_rcu(mm, address); > > > > > > > > + if (vma) { > > > > > > > > + /* > > > > > > > > + * lock_vma_under_rcu() only checks anon_vma for private > > > > > > > > + * anonymous mappings. But we need to ensure it is assigned in > > > > > > > > + * private file-backed vmas as well. > > > > > > > > + */ > > > > > > > > + if (prepare_anon && !(vma->vm_flags & VM_SHARED) && > > > > > > > > + !vma->anon_vma) > > > > > > > > + vma_end_read(vma); > > > > > > > > + else > > > > > > > > + return vma; > > > > > > > > + } > > > > > > > > + > > > > > > > > + mmap_read_lock(mm); > > > > > > > > + vma = vma_lookup(mm, address); > > > > > > > > + if (vma) { > > > > > > > > + if (prepare_anon && !(vma->vm_flags & VM_SHARED) && > > > > > > > > + anon_vma_prepare(vma)) { > > > > > > > > + vma = ERR_PTR(-ENOMEM); > > > > > > > > + } else { > > > > > > > > + /* > > > > > > > > + * We cannot use vma_start_read() as it may fail due to > > > > > > > > + * false locked (see comment in vma_start_read()). We > > > > > > > > + * can avoid that by directly locking vm_lock under > > > > > > > > + * mmap_lock, which guarantees that nobody can lock the > > > > > > > > + * vma for write (vma_start_write()) under us. > > > > > > > > + */ > > > > > > > > + down_read(&vma->vm_lock->lock); > > > > > > > > + } > > > > > > > > + } > > > > > > > > + > > > > > > > > + mmap_read_unlock(mm); > > > > > > > > + return vma; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static void unlock_vma(struct vm_area_struct *vma) > > > > > > > > +{ > > > > > > > > + vma_end_read(vma); > > > > > > > > +} > > > > > > > > + > ... > > > > > The current implementation has a deadlock problem: > > > > thread 1 > > thread 2 > > > > vma_start_read(dst_vma) > > > > > > mmap_write_lock() > > > > vma_start_write(src_vma) > > vma_start_read(src_vma) fails > > mmap_read_lock() blocks > > > > > > vma_start_write(dst_vma) > > blocks > > > > > > I think the solution is to implement it this way: > > > > long find_and_lock_vmas(...) > > { > > dst_vma = lock_vma(mm, dst_start, true); > > if (IS_ERR(dst_vma)) > > return PTR_ERR(vma); > > > > src_vma = lock_vma_under_rcu(mm, src_start); > > if (!src_vma) { > > long err; > > if (mmap_read_trylock(mm)) { > > src_vma = vma_lookup(mm, src_start); > > if (src_vma) { > > > > down_read(&src_vma->vm_lock->lock); > > err = 0; > > } else { > > err = -ENOENT; > > } > > mmap_read_unlock(mm); > > } else { > > vma_end_read(dst_vma); > > err = lock_mm_and_find_vmas(...); > > if (!err) { > > Right now lock_mm_and_find_vmas() doesn't use the per-vma locking, so > you'd have to lock those here too. I'm sure you realise that, but this > is very convoluted. That's right. I realized that after I sent this email. > > > mmap_read_unlock(mm); > > } > > } > > return err; > > On contention you will now abort vs block. Is it? On contention mmap_read_trylock() will fail and we do the whole operation using lock_mm_and_find_vmas() which blocks on mmap_lock. Am I missing something? > > > } > > return 0; > > } > > > > Of course this would need defining lock_mm_and_find_vmas() regardless > > of CONFIG_PER_VMA_LOCK. I can also remove the prepare_anon condition > > in lock_vma(). > > You are adding a lot of complexity for a relatively rare case, which is > probably not worth optimising. > > I think you'd be better served by something like : > > if (likely(src_vma)) > return 0; > > /* Undo any locking */ > vma_end_read(dst_vma); > > /* Fall back to locking both in mmap_lock critical section */ Agreed on reduced complexity. But as Suren pointed out in one of his replies that lock_vma_under_rcu() may fail due to seq overflow. That's why lock_vma() uses vma_lookup() followed by direct down_read() on vma-lock. IMHO what we need here is exactly lock_mm_and_find_vmas() and the code can be further simplified as follows: err = lock_mm_and_find_vmas(...); if (!err) { down_read(dst_vma...); if (dst_vma != src_vma) down_read(src_vma....); mmap_read_unlock(mm); } return err; (another thing I'm fixing in the next version is avoiding locking the vma twice if both src_start and dst_start are in same vma) > mmap_read_lock(); > /* > * probably worth creating an inline function for a single vma > * find/populate that can be used in lock_vma() that does the anon vma > * population? > */ Good idea. This can simplify both lock_vma() as well as lock_mm_and_find_vmas(). > dst_vm = lock_vma_populate_anon(); > ... > > src_vma = lock_vma_under_rcu(mm, src_start); > ... > > > mmap_read_unlock(); > return 0; > > src_failed: > vma_end_read(dst_vma); > dst_failed: > mmap_read_unlock(); > return err; > > Thanks, > Liam > > ... >