On Fri, 30 Nov 2018 14:58:11 -0500 Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > Currently we only drop the mmap_sem if there is contention on the page > lock. The idea is that we issue readahead and then go to lock the page > while it is under IO and we want to not hold the mmap_sem during the IO. > > The problem with this is the assumption that the readahead does > anything. In the case that the box is under extreme memory or IO > pressure we may end up not reading anything at all for readahead, which > means we will end up reading in the page under the mmap_sem. Please describe here why this is considered to be a problem. Application stalling, I assume? Some description of in-the-field observations would be appropriate. ie, how serious is the problem whcih is being addressed. > Instead rework filemap fault path to drop the mmap sem at any point that > we may do IO or block for an extended period of time. This includes > while issuing readahead, locking the page, or needing to call ->readpage > because readahead did not occur. Then once we have a fully uptodate > page we can return with VM_FAULT_RETRY and come back again to find our > nicely in-cache page that was gotten outside of the mmap_sem. > > ... > > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2304,28 +2304,44 @@ EXPORT_SYMBOL(generic_file_read_iter); > > #ifdef CONFIG_MMU > #define MMAP_LOTSAMISS (100) > +static struct file *maybe_unlock_mmap_for_io(struct file *fpin, > + struct vm_area_struct *vma, > + int flags) > +{ > + if (fpin) > + return fpin; > + if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == > + FAULT_FLAG_ALLOW_RETRY) { > + fpin = get_file(vma->vm_file); > + up_read(&vma->vm_mm->mmap_sem); > + } > + return fpin; > +} A code comment would be nice. What it does and, especially, why it does it. > - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { > - put_page(page); > - return ret | VM_FAULT_RETRY; > + /* > + * We are open-coding lock_page_or_retry here because we want to do the > + * readpage if necessary while the mmap_sem is dropped. If there > + * happens to be a lock on the page but it wasn't being faulted in we'd > + * come back around without ALLOW_RETRY set and then have to do the IO > + * under the mmap_sem, which would be a bummer. Expanding on "a bummer" would help here ;) > + */ > + if (!trylock_page(page)) { > + fpin = maybe_unlock_mmap_for_io(fpin, vmf->vma, vmf->flags); > + if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) > + goto out_retry; > + if (vmf->flags & FAULT_FLAG_KILLABLE) { > + if (__lock_page_killable(page)) { > + /* > + * If we don't have the right flags for > + * maybe_unlock_mmap_for_io to do it's thing we "its" > + * still need to drop the sem and return > + * VM_FAULT_RETRY so the upper layer checks the > + * signal and takes the appropriate action. > + */ > + if (!fpin) > + up_read(&vmf->vma->vm_mm->mmap_sem); > + goto out_retry; > + } > + } else > + __lock_page(page); > } > > /* Did it get truncated? */