On Wed, Feb 12, 2020 at 02:24:35PM -0800, Andrew Morton wrote: > On Wed, 12 Feb 2020 11:53:22 -0800 Minchan Kim <minchan@xxxxxxxxxx> wrote: > > > > That's definitely wrong. It'll clear PageReclaim and then pretend it did > > > nothing wrong. > > > > > > return !PageWriteback(page) || > > > test_and_clear_bit(PG_reclaim, &page->flags); > > > > > > > Much better, Thanks for the review, Matthew! > > If there is no objection, I will send two patches to Andrew. > > One is PageReadahead strict, the other is limit retry from mm_populate. > > With much more detailed changelogs, please! > > This all seems rather screwy. if a page is under writeback then it is > uptodate and we should be able to fault it in immediately. Hi Andrew, This description in cover-letter will work? If so, I will add each part below in each patch. Subject: [PATCH 0/3] fixing mm_populate long stall I got several reports major page fault takes several seconds sometime. When I review drop mmap_sem in page fault hanlder, I found several bugs. CPU 1 CPU 2 mm_populate for () .. ret = populate_vma_page_range __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead shrink_page_list pageout SetPageReclaim(=SetPageReadahead) writepage SetPageWriteback if (PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (PageWriteback(page)) return; here, since ret from populate_vma_page_range is zero, the loop continue to run with same address with previous iteration. It will repeat the loop until the page's writeout is done(ie, PG_writeback or PG_reclaim clear). We could fix the above specific case via adding PageWriteback. IOW, ret = populate_vma_page_range ... ... filemap_fault do_async_mmap_readahead if (!PageWriteback(page) && PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (PageWriteback(page)) return; That's a thing [3/3] is fixing here. Even though it could fix the problem effectively, it has still livelock problem theoretically because the page of faulty address could be reclaimed and then allocated/become readahead marker on other CPUs during faulty process is retrying in mm_populate's loop. [2/3] is fixing the such livelock via limiting retry count. There is another hole for the livelock or hang of the process as well as ageWriteback - ra_pages. mm_populate for () .. ret = populate_vma_page_range __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead if (PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (!ra->ra_pages) return; It will repeat the loop until ra->ra_pages become non-zero. [1/3] is fixing the problem. Jan Kara (1): mm: Don't bother dropping mmap_sem for zero size readahead Minchan Kim (2): mm: fix long time stall from mm_populate mm: make PageReadahead more strict include/linux/page-flags.h | 28 ++++++++++++++++++++++++++-- mm/filemap.c | 2 +- mm/gup.c | 9 +++++++-- mm/readahead.c | 6 ------ 4 files changed, 34 insertions(+), 11 deletions(-) -- 2.25.0.225.g125e21ebc7-goog