Recently, I got some bugreports major page fault takes several seconds sometime. When I review drop mmap_sem logic, I found several bugs. CPU 1 CPU 2 mm_populate for () .. ret = populate_vma_page_range __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead shrink_page_list pageout SetPageReclaim(=SetPageReadahead) writepage SetPageWriteback if (PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (PageWriteback(page)) return; Here, since ret from populate_vma_page_range is zero, the loop continue to run with same address with previous iteration. It will repeat the loop until the page's writeout is done(ie, PG_writeback or PG_reclaim is clear). We could fix the above specific case via adding PageWriteback ret = populate_vma_page_range ... ... filemap_fault do_async_mmap_readahead if (!PageWriteback(page) && PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (PageWriteback(page)) return; Furthermore, to prevent potential issues caused by sharing PG_readahead with PG_reclaim, let's make page flag wrapper for PageReadahead with description. With that, we could remove PageWriteback check in page_cache_async_readahead, which is more clear for maintenance/ readability. Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> --- include/linux/page-flags.h | 28 ++++++++++++++++++++++++++-- mm/readahead.c | 6 ------ 2 files changed, 26 insertions(+), 8 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1bf83c8fcaa7..f91a9b2a49bd 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -363,8 +363,32 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL) /* PG_readahead is only used for reads; PG_reclaim is only for writes */ PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL) TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL) -PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND) - TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND) + +SETPAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND) +CLEARPAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND) + +/* + * Since PG_readahead is shared with PG_reclaim of the page flags, + * PageReadahead should double check whether it's readahead marker + * or PG_reclaim. It could be done by PageWriteback check because + * PG_reclaim is always with PG_writeback. + */ +static inline int PageReadahead(struct page *page) +{ + VM_BUG_ON_PGFLAGS(PageCompound(page), page); + + return (page->flags & (1UL << PG_reclaim | 1UL << PG_writeback)) == + (1UL << PG_reclaim); +} + +/* Clear PG_readahead only if it's PG_readahead, not PG_reclaim */ +static inline int TestClearPageReadahead(struct page *page) +{ + VM_BUG_ON_PGFLAGS(PageCompound(page), page); + + return !PageWriteback(page) || + test_and_clear_bit(PG_reclaim, &page->flags); +} #ifdef CONFIG_HIGHMEM /* diff --git a/mm/readahead.c b/mm/readahead.c index 2fe72cd29b47..85b15e5a1d7b 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -553,12 +553,6 @@ page_cache_async_readahead(struct address_space *mapping, if (!ra->ra_pages) return; - /* - * Same bit is used for PG_readahead and PG_reclaim. - */ - if (PageWriteback(page)) - return; - ClearPageReadahead(page); /* -- 2.25.0.265.gbab2e86ba0-goog