On Wed, Sep 27, 2023 at 10:18 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Wed, Sep 27, 2023 at 03:38:38PM -0700, Suren Baghdasaryan wrote: > > On Tue, Sep 26, 2023 at 10:25 PM Matthew Wilcox (Oracle) > > <willy@xxxxxxxxxxxxx> wrote: > > > It is usually safe to call wp_page_copy() under the VMA lock. The only > > > unsafe situation is when no anon_vma has been allocated for this VMA, > > > and we have to look at adjacent VMAs to determine if their anon_vma can > > > be shared. Since this happens only for the first COW of a page in this > > > VMA, the majority of calls to wp_page_copy() do not need to fall back > > > to the mmap_sem. > > > > > > Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which > > > will return RETRY if we currently hold the VMA lock and need to allocate > > > an anon_vma. This lets us drop the check in do_wp_page(). > > > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > > --- > > > mm/memory.c | 39 ++++++++++++++++++++++++++------------- > > > 1 file changed, 26 insertions(+), 13 deletions(-) > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 97f860d6cd2a..cff78c496728 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -3042,6 +3042,21 @@ static inline void wp_page_reuse(struct vm_fault *vmf) > > > count_vm_event(PGREUSE); > > > } > > > > > > +static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf) > > > +{ > > > + struct vm_area_struct *vma = vmf->vma; > > > + > > > + if (likely(vma->anon_vma)) > > > + return 0; > > > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > > > I don't think the above condition will happen today because > > lock_vma_under_rcu() returns NULL and do_page_fault() falls back to > > taking mmap_lock when !vma->anon_vma > > (https://elixir.bootlin.com/linux/v6.6-rc3/source/mm/memory.c#L5428). > > We would need to narrow down that check in lock_vma_under_rcu() to > > make this work here. > > That's only for anon VMAs. For file-backed VMAs, we can get here ... > > handle_pte_fault() > if (vmf->flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { > if (!pte_write(entry)) > return do_wp_page(vmf); > > ie we we have a MAP_PRIVATE of a file, first take a read-fault on it, > then write to it. That causes us to allocate an anon page in this > file-backed VMA, so we need an anon_vma to exist. Oh, sorry, I completely missed that this check was only for anon VMAs. Now it makes sense. Thanks!