On Tue, Jan 17, 2023 at 7:47 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Mon 09-01-23 12:53:23, Suren Baghdasaryan wrote: > > Introduce lock_vma_under_rcu function to lookup and lock a VMA during > > page fault handling. When VMA is not found, can't be locked or changes > > after being locked, the function returns NULL. The lookup is performed > > under RCU protection to prevent the found VMA from being destroyed before > > the VMA lock is acquired. VMA lock statistics are updated according to > > the results. > > For now only anonymous VMAs can be searched this way. In other cases the > > function returns NULL. > > Could you describe why only anonymous vmas are handled at this stage and > what (roughly) has to be done to support other vmas? lock_vma_under_rcu > doesn't seem to have any anonymous vma specific requirements AFAICS. TBH I haven't spent too much time looking into file-backed page faults yet but a couple of tasks I can think of are: - Ensure that all vma->vm_ops->fault() handlers do not rely on mmap_lock being read-locked; - vma->vm_file freeing like VMA freeing will need to be done after RCU grace period since page fault handlers use it. This will require some caution because simply adding it into __vm_area_free() called via call_rcu() will cause corresponding fops->release() to be called asynchronously. I had to solve this issue with out-of-tree SPF implementation when asynchronously called snd_pcm_release() was problematic. I'm sure I'm missing more potential issues and maybe Matthew and Michel can pinpoint more things to resolve here? > > Also isn't lock_vma_under_rcu effectively find_read_lock_vma? Not that > the naming is really the most important part but the rcu locking is > internal to the function so why should we spread this implementation > detail to the world... I wanted the name to indicate that the lookup is done with no locks held. But I'm open to suggestions. > > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > --- > > include/linux/mm.h | 3 +++ > > mm/memory.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 54 insertions(+) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index c464fc8a514c..d0fddf6a1de9 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -687,6 +687,9 @@ static inline void vma_assert_no_reader(struct vm_area_struct *vma) > > vma); > > } > > > > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > + unsigned long address); > > + > > #else /* CONFIG_PER_VMA_LOCK */ > > > > static inline void vma_init_lock(struct vm_area_struct *vma) {} > > diff --git a/mm/memory.c b/mm/memory.c > > index 9ece18548db1..a658e26d965d 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5242,6 +5242,57 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > } > > EXPORT_SYMBOL_GPL(handle_mm_fault); > > > > +#ifdef CONFIG_PER_VMA_LOCK > > +/* > > + * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be > > + * stable and not isolated. If the VMA is not found or is being modified the > > + * function returns NULL. > > + */ > > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > + unsigned long address) > > +{ > > + MA_STATE(mas, &mm->mm_mt, address, address); > > + struct vm_area_struct *vma, *validate; > > + > > + rcu_read_lock(); > > + vma = mas_walk(&mas); > > +retry: > > + if (!vma) > > + goto inval; > > + > > + /* Only anonymous vmas are supported for now */ > > + if (!vma_is_anonymous(vma)) > > + goto inval; > > + > > + if (!vma_read_trylock(vma)) > > + goto inval; > > + > > + /* Check since vm_start/vm_end might change before we lock the VMA */ > > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > > + vma_read_unlock(vma); > > + goto inval; > > + } > > + > > + /* Check if the VMA got isolated after we found it */ > > + mas.index = address; > > + validate = mas_walk(&mas); > > + if (validate != vma) { > > + vma_read_unlock(vma); > > + count_vm_vma_lock_event(VMA_LOCK_MISS); > > + /* The area was replaced with another one. */ > > + vma = validate; > > + goto retry; > > + } > > + > > + rcu_read_unlock(); > > + return vma; > > +inval: > > + rcu_read_unlock(); > > + count_vm_vma_lock_event(VMA_LOCK_ABORT); > > + return NULL; > > +} > > +#endif /* CONFIG_PER_VMA_LOCK */ > > + > > #ifndef __PAGETABLE_P4D_FOLDED > > /* > > * Allocate p4d page table. > > -- > > 2.39.0 > > -- > Michal Hocko > SUSE Labs