On Tue, Nov 05, 2019 at 04:28:21PM +0100, Vlastimil Babka wrote: > On 11/5/19 2:23 PM, Robert Stupp wrote: > > "git bisect" led to a result. > > > > The offending merge commit is f91f2ee54a21404fbc633550e99d69d14c2478f2 > > "Merge branch 'akpm' (rest of patches from Andrew)". > > > > The first bad commit in the merged series of commits is > > https://github.com/torvalds/linux/commit/6b4c9f4469819a0c1a38a0a4541337e0f9bf6c11 > > . a75d4c33377277b6034dd1e2663bce444f952c14, the commit before 6b4c9f44, > > is good. > > Ah, great you could bisect this. CCing people from the commit > 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") Judging from Robert's stack captures, the task is not hung but busy-looping in __mm_populate(). AFAICS, the only way this can occur is if populate_vma_page_range() returns 0 and we don't advance the iteration position (if it returned an error, we wouldn't reset nend and move on to the next vma as ignore_errors is 1 for mlockall.) populate_vma_page_range() returns 0 when the first page is not found and faultin_page() returns -EBUSY (if it were processing pages, or if the error from faultin_page() would be a different one, we would return the number of pages processed or -error). faultin_page() returns -EBUSY when VM_FAULT_RETRY is set, i.e. we dropped the mmap_sem in order to initiate IO and require a retry. That is consistent with the bisect result (new VM_FAULT_RETRY conditions). At this point, regular page fault would retry with FAULT_FLAG_TRIED to indicate that the mmap_sem cannot be dropped a second time. But this mlock path doesn't set that flag and we can loop repeatedly. That is something we probably need to fix with a FOLL_TRIED somewhere. What I don't quite understand yet is why the fault path doesn't make progress eventually. We must drop the mmap_sem without changing the state in any way. How can we keep looping on the same page?