On Fri, Nov 30, 2018 at 02:58:11PM -0500, Josef Bacik wrote: > Currently we only drop the mmap_sem if there is contention on the page > lock. The idea is that we issue readahead and then go to lock the page > while it is under IO and we want to not hold the mmap_sem during the IO. > > The problem with this is the assumption that the readahead does > anything. In the case that the box is under extreme memory or IO > pressure we may end up not reading anything at all for readahead, which > means we will end up reading in the page under the mmap_sem. I'd also add that even if readahead did something, the block request queues could be contended enough that merely submitting the io could become IO bound if it has to wait for in-flight requests. Not really a concern with cgroup IO control, but this has always somewhat defeated the original purpose of the mmap_sem dropping (avoiding serializing page faults when there is a writer queued). > Instead rework filemap fault path to drop the mmap sem at any point that > we may do IO or block for an extended period of time. This includes > while issuing readahead, locking the page, or needing to call ->readpage > because readahead did not occur. Then once we have a fully uptodate > page we can return with VM_FAULT_RETRY and come back again to find our > nicely in-cache page that was gotten outside of the mmap_sem. > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> Keeping the fpin throughout the fault handler makes things a lot simpler than the -EAGAIN and wait_on_page_locked dance from earlier versions. Nice. Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>