On Tue, Jun 4, 2024 at 5:25 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > > Attempt to use RCU-protected per-VMA lock when looking up requested VMA > as much as possible, only falling back to mmap_lock if per-VMA lock > failed. This is done so that querying of VMAs doesn't interfere with > other critical tasks, like page fault handling. > > This has been suggested by mm folks, and we make use of a newly added > internal API that works like find_vma(), but tries to use per-VMA lock. > > We have two sets of setup/query/teardown helper functions with different > implementations depending on availability of per-VMA lock (conditioned > on CONFIG_PER_VMA_LOCK) to abstract per-VMA lock subtleties. > > When per-VMA lock is available, lookup is done under RCU, attempting to > take a per-VMA lock. If that fails, we fallback to mmap_lock, but then > proceed to unconditionally grab per-VMA lock again, dropping mmap_lock > immediately. In this configuration mmap_lock is never helf for long, > minimizing disruptions while querying. > > When per-VMA lock is compiled out, we take mmap_lock once, query VMAs > using find_vma() API, and then unlock mmap_lock at the very end once as > well. In this setup we avoid locking/unlocking mmap_lock on every looked > up VMA (depending on query parameters we might need to iterate a few of > them). > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > --- > fs/proc/task_mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 614fbe5d0667..140032ffc551 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -388,6 +388,49 @@ static int pid_maps_open(struct inode *inode, struct file *file) > PROCMAP_QUERY_VMA_FLAGS \ > ) > > +#ifdef CONFIG_PER_VMA_LOCK > +static int query_vma_setup(struct mm_struct *mm) > +{ > + /* in the presence of per-VMA lock we don't need any setup/teardown */ > + return 0; > +} > + > +static void query_vma_teardown(struct mm_struct *mm, struct vm_area_struct *vma) > +{ > + /* in the presence of per-VMA lock we need to unlock vma, if present */ > + if (vma) > + vma_end_read(vma); > +} > + > +static struct vm_area_struct *query_vma_find_by_addr(struct mm_struct *mm, unsigned long addr) > +{ > + struct vm_area_struct *vma; > + > + /* try to use less disruptive per-VMA lock */ > + vma = find_and_lock_vma_rcu(mm, addr); > + if (IS_ERR(vma)) { > + /* failed to take per-VMA lock, fallback to mmap_lock */ > + if (mmap_read_lock_killable(mm)) > + return ERR_PTR(-EINTR); > + > + vma = find_vma(mm, addr); > + if (vma) { > + /* > + * We cannot use vma_start_read() as it may fail due to > + * false locked (see comment in vma_start_read()). We > + * can avoid that by directly locking vm_lock under > + * mmap_lock, which guarantees that nobody can lock the > + * vma for write (vma_start_write()) under us. > + */ > + down_read(&vma->vm_lock->lock); Hi Andrii, The above pattern of locking VMA under mmap_lock and then dropping mmap_lock is becoming more common. Matthew had an RFC proposal for an API to do this here: https://lore.kernel.org/all/ZivhG0yrbpFqORDw@xxxxxxxxxxxxxxxxxxxx/. It might be worth reviving that discussion. > + } > + > + mmap_read_unlock(mm); Later on in your code you are calling get_vma_name() which might call anon_vma_name() to retrieve user-defined VMA name. After this patch this operation will be done without holding mmap_lock, however per https://elixir.bootlin.com/linux/latest/source/include/linux/mm_types.h#L582 this function has to be called with mmap_lock held for read. Indeed with debug flags enabled you should hit this assertion: https://elixir.bootlin.com/linux/latest/source/mm/madvise.c#L96. > + } > + > + return vma; > +} > +#else > static int query_vma_setup(struct mm_struct *mm) > { > return mmap_read_lock_killable(mm); > @@ -402,6 +445,7 @@ static struct vm_area_struct *query_vma_find_by_addr(struct mm_struct *mm, unsig > { > return find_vma(mm, addr); > } > +#endif > > static struct vm_area_struct *query_matching_vma(struct mm_struct *mm, > unsigned long addr, u32 flags) > @@ -441,8 +485,10 @@ static struct vm_area_struct *query_matching_vma(struct mm_struct *mm, > skip_vma: > /* > * If the user needs closest matching VMA, keep iterating. > + * But before we proceed we might need to unlock current VMA. > */ > addr = vma->vm_end; > + vma_end_read(vma); /* no-op under !CONFIG_PER_VMA_LOCK */ > if (flags & PROCMAP_QUERY_COVERING_OR_NEXT_VMA) > goto next_vma; > no_vma: > -- > 2.43.0 >