On 10/5/21 10:58 AM, Hugh Dickins wrote:
On Tue, 5 Oct 2021, Rongwei Wang wrote:
Hi,
I have run our cases these two days to stress test new Patch #1. The new Patch
#1 mainly add filemap_invalidate_{un}lock before and after
truncate_pagecache(), basing on original Patch #1. And the crash has not
happened.
Now, I keep the original Patch #1, then adding the code below which suggested
by liu song (I'm not sure which one I should add in the next version,
Suggested-by or Signed-off-by? If you know, please remind me).
- if (filemap_nr_thps(inode->i_mapping))
+ if (filemap_nr_thps(inode->i_mapping)) {
+ filemap_invalidate_lock(inode->i_mapping);
truncate_pagecache(inode, 0);
+ filemap_invalidate_unlock(inode->i_mapping);
+ }
I won't NAK that patch; but I still believe it's unnecessary, and don't
see how it protects against all the races (collapse_file() does not use
that lock, whereas collapse_file() does use page lock). And if you're
hoping to fix 5.10, then you will have to backport those invalidate_lock
patches there too (they're really intended to protect hole-punching).
And the reason for keeping the original Patch #1 is mainly to fix the race
between collapse_file and truncate_pagecache. It seems necessary. Despite the
two-day test, I did not reproduce this race any more.
In addition, I also test the below method:
diff --git a/mm/truncate.c b/mm/truncate.c
index 3f47190f98a8..33604e4ce60a 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -210,8 +210,6 @@ invalidate_complete_page(struct address_space *mapping,
struct page *page)
int truncate_inode_page(struct address_space *mapping, struct page *page)
{
- VM_BUG_ON_PAGE(PageTail(page), page);
-
if (page->mapping != mapping)
return -EIO;
I am not very sure this VM_BUG_ON_PAGE(PageTail) is what Hugh means. And
the test results show that only removing this VM_BUG_ON_PAGE(PageTail) has no
effect. So, I still keep the original Patch #1 to fix one race.
Yes, that's exactly what I meant, and thank you for intending to try it.
But if that patch had "no effect", then I think you were not running the
kernel with that patch applied: because it deletes the BUG on line 213
of mm/truncate.c, which is what you reported in the first mail!
Or, is line 213 of mm/truncate.c in your 5.10.46-hugetext+ kernel
something else? I've been looking at 5.15-rc.
Hi, Hugh
I'm sorry the confusing '5.10.46-hugetext+'. I am also look and test at
5.15-rc.
But I wasn't proposing to delete it merely to hide the BUG: as I hope
I explained, we could move it below the page->mapping check, but it
wouldn't really be of any value there since tails have NULL page->mapping
anyway (well, I didn't check first and second tails, maybe mapping gets
reused for some compound page field in those). I was proposing to delete
it because the page->mapping check then weeds out the racy case once
we're holding page lock, without the need for adding anything special.
Hugh
Today, I try again to create some cases to reproduce the race, such as
ensuring that multiple processes are always executing
‘truncate_pagecache’ and only mapping 2M DSO. In this way, I try to
ensure that the target of 'collapse_file' and 'truncate_pagecache' can
only be the same VMA, to increase the probability of reproducing that
race. But, I can't reproduce that race any more.
In fact, according to the previous experience, the current number of
attempts should be able to reproduce that race.
If you have the idea about creating this case, please tell me, and I can
try again. Or we can solve it when it appears again.
Thanks!