Re: [PATCH] mm/thp: flush file for !is_shmem PageDirty() case in collapse_file()

Song Liu <songliubraving@xxxxxx> · Thu, 31 Oct 2019 04:55:29 +0000

On Oct 30, 2019, at 6:51 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:



On Wed, 30 Oct 2019 13:07:36 -0700 Song Liu <songliubraving@xxxxxx> wrote:



For non-shmem file THPs, khugepaged only collapses read only .text mapping

(VM_DENYWRITE). These pages should not be dirty except the case where the

file hasn't been flushed since first write.



Call filemap_flush() in collapse_file() to accelerate the write back in

such cases.



Also add warning if PageDirty() triggered for pages from readahead path.



Reported-and-tested-by: 
syzbot+efb9e48b9fbdc49bb34a@xxxxxxxxxxxxxxxxxxxxxxxxx




If sysbot reported something then we definitely want to be able to

review that report when reviewing the patch!  So please do include a

Link: for that sort of thing.



A bit of sleuthing leads me to

http://lkml.kernel.org/r/000000000000c50fd70595fdd5b2@xxxxxxxxxx



which shows that this patch is in fact a fix against

mmthp-recheck-each-page-before-collapsing-file-thp.patch.  This very

important information wasn't in the changelog.



And this patch doesn't really fix the sysbot hang, does it?  The patch

restores the flush which was removed in order to fix the sysbot hang.



In general, please do try to include all this sort of information when

preparing changelogs.




Sorry
 for the confusion. This one is a bit tricky.



syzbot
 reported hang issue with filemap_flush(), which is already fixed

by
 previous patch which removes filemap_flush(). 



This
 patch tries to add filemap_flush() back, in a proper way. So this 

patch
 is an improvement, not a fix. That's why I didn't include the 

fixes
 tag. 



However,
 I used syzbot to test this patch, by replying "#syz test: ...". 

to
 the previous report (the one you found). I didn't cc the mail list

for
 those tests. syzbot did find bug with an earlier version, and gave 

green
 light for this version. This is the reason I would like to give 

syzbot
 credit with the tag. Maybe I should just make it "Tested-by: 

syzbot"?

--- a/mm/khugepaged.c

+++ b/mm/khugepaged.c

@@ -1601,6 +1601,33 @@ static void collapse_file(struct mm_struct *mm,

result
 = SCAN_FAIL;

goto
 xa_unlocked;

}

+ if (WARN_ON_ONCE(PageDirty(page)))
 {




I'm not understanding what guarantees this.  Can't another process

which has the file open for writing come in and dirty the page after

the readahead has completed and before this process locks the page?




For
 non-shmem file, khugepaged only work for VM_DENYWRITE mappings. 

Therefore,
 the page cannot be opened for write by another process.






+
/*

+  *
 page from readahead should not

+  *
 be dirty. Show warning if this

+  *
 somehow happens.

+  */

+ result
 = SCAN_FAIL;

+ goto
 out_unlock;

+ }

+ } else if (PageDirty(page)) {

+ /*

+  *
 khugepaged only works on read-only fd,

+  *
 so this page is dirty because it hasn't

+  *
 been flushed since first write. There

+  *
 won't be new dirty pages.

+  *

+  *
 Trigger async flush here and hope the

+  *
 writeback is done when khugepaged

+  *
 revisits this page.

+  *

+  *
 This is a one-off situation. We are not

+  *
 forcing writeback in loop.

+  */

+ xas_unlock_irq(&xas);

+ filemap_flush(mapping);

+ result
 = SCAN_FAIL;

+ goto xa_unlocked;

} else if (trylock_page(page)) {

get_page(page);

xas_unlock_irq(&xas);




The patch mmthp-recheck-each-page-before-collapsing-file-thp.patch has

undergone quite a bit of churn so I don't think it should be mainlined

without more testing and review.  But it fixes a significant issue.  So

could the appropriate developers please take some time to recheck and

retest it all?




We
 are running production tests with previous version (no filemap_flush). 

syzbot
 also gives green light to that one. Maybe we should ship the 

version
 without filemap_flush with 5.4, and this improvement with 5.5?



This
 improvement also passed syzbot's test. If this looks good, we will 

also
 add this to production tests. 



Thanks,

Song