On Wed, Oct 30, 2019 at 06:51:15PM -0700, Andrew Morton wrote: > To that end, here's the combination of > mmthp-recheck-each-page-before-collapsing-file-thp.patch and this > patch: > > From: Song Liu <songliubraving@xxxxxx> > Subject: mm,thp: recheck each page before collapsing file THP > > In collapse_file(), for !is_shmem case, current check cannot guarantee > the locked page is up-to-date. Specifically, xas_unlock_irq() should > not be called before lock_page() and get_page(); and it is necessary to > recheck PageUptodate() after locking the page. > > With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text > may contain corrupted data. This is because khugepaged mistakenly > collapses some not up-to-date sub pages into a huge page, and assumes > the huge page is up-to-date. This will NOT corrupt data in the disk, > because the page is read-only and never written back. Fix this by > properly checking PageUptodate() after locking the page. This check > replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);". > > Also, move PageDirty() check after locking the page. Current > khugepaged should not try to collapse dirty file THP, because it is > limited to read-only .text. Add a warning with the PageDirty() check > as it should not happen. This warning is added after page_mapping() > check, because if the page is truncated, it might be dirty. > > [songliubraving@xxxxxx: flush file for !is_shmem PageDirty() case in collapse_file()] > Link: http://lkml.kernel.org/r/20191030200736.3455046-1-songliubraving@xxxxxx > [songliubraving@xxxxxx: v4] > Link: http://lkml.kernel.org/r/20191022191006.411277-1-songliubraving@xxxxxx > [songliubraving@xxxxxx: fix deadlock in collapse_file()] > Link: http://lkml.kernel.org/r/20191028221414.3685035-1-songliubraving@xxxxxx > Link: http://lkml.kernel.org/r/20191018180345.4188310-1-songliubraving@xxxxxx > Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") > Signed-off-by: Song Liu <songliubraving@xxxxxx> > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: William Kucharski <william.kucharski@xxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > > mm/khugepaged.c | 49 +++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 40 insertions(+), 9 deletions(-) > > --- a/mm/khugepaged.c~mmthp-recheck-each-page-before-collapsing-file-thp > +++ a/mm/khugepaged.c > @@ -1601,17 +1601,33 @@ static void collapse_file(struct mm_stru > result = SCAN_FAIL; > goto xa_unlocked; > } > - } else if (!PageUptodate(page)) { > - xas_unlock_irq(&xas); > - wait_on_page_locked(page); > - if (!trylock_page(page)) { > - result = SCAN_PAGE_LOCK; > - goto xa_unlocked; > + if (WARN_ON_ONCE(PageDirty(page))) { > + /* > + * page from readahead should not > + * be dirty. Show warning if this > + * somehow happens. > + */ > + result = SCAN_FAIL; > + goto out_unlock; I'm not sure we need this check here. We have a dirty check on the locked page below, and it's not clear why there is a suspicion of readahead producing dirty pages...? > } > - get_page(page); > } else if (PageDirty(page)) { > + /* > + * khugepaged only works on read-only fd, > + * so this page is dirty because it hasn't > + * been flushed since first write. There > + * won't be new dirty pages. > + * > + * Trigger async flush here and hope the > + * writeback is done when khugepaged > + * revisits this page. > + * > + * This is a one-off situation. We are not > + * forcing writeback in loop. > + */ > + xas_unlock_irq(&xas); > + filemap_flush(mapping); > result = SCAN_FAIL; > - goto xa_locked; > + goto xa_unlocked; We should do this to improve the file THP success rate. But we shouldn't conflate it with the data corruption bug fixed in the hunks below: > } else if (trylock_page(page)) { > get_page(page); > xas_unlock_irq(&xas); > @@ -1626,7 +1642,12 @@ static void collapse_file(struct mm_stru > * without racing with truncate. > */ > VM_BUG_ON_PAGE(!PageLocked(page), page); > - VM_BUG_ON_PAGE(!PageUptodate(page), page); > + > + /* double check the page is up to date */ > + if (unlikely(!PageUptodate(page))) { > + result = SCAN_FAIL; > + goto out_unlock; > + } This... > @@ -1642,6 +1663,16 @@ static void collapse_file(struct mm_stru > goto out_unlock; > } > > + if (!is_shmem && PageDirty(page)) { > + /* > + * khugepaged only works on read-only fd, so this > + * page is dirty because it hasn't been flushed > + * since first write. > + */ > + result = SCAN_FAIL; > + goto out_unlock; > + } ...and this are pretty urgent chunks because they protect the integrity of the page cache. They should go in asap. So I suggest we do two patches instead: 1. These two locked page checks to fix the corruption in 5.4 2. The filemap_flush() bit to improve THP coverage in 5.5