On 20.08.24 10:00, David Hildenbrand wrote:
On 13.08.24 20:25, Boris Burkov wrote:
The commit e41c81d0d30e ("mm/truncate: Replace page_mapped() call in
invalidate_inode_page()") replaced the page_mapped(page) check with a
refcount check. However, this refcount check does not work as expected
with drop_caches, at least for btrfs's metadata pages.
Btrfs has a per-sb metadata inode with cached pages, and when not in
active use by btrfs, they have a refcount of 3. One from the initial
call to alloc_pages, one (nr_pages == 1) from filemap_add_folio, and one
from folio_attach_private. We would expect such pages to get dropped by
drop_caches. However, drop_caches calls into mapping_evict_folio via
mapping_try_invalidate which gets a reference on the folio with
find_lock_entries(). As a result, these pages have a refcount of 4, and
fail this check.
For what it's worth, such pages do get reclaimed under memory pressure,
and if I change this refcount check to `if folio_mapped(folio)`
The following script produces such pages and uses drgn to further
analyze the state of the folios:
https://github.com/boryas/scripts/blob/main/sh/strand-meta/run.sh
It should at least outline the basic idea for producing some btrfs
metadata via creating inlined-extent files.
My proposed fix for the issue is to add one more hardcoded refcount to
this check to account for the caller having a refcount on the page.
However, I am less familiar with the other caller into
mapping_evict_folio in the page fault path, so I am concerned this fix
will not work properly there, and would appreciate extra scrutiny there.
Fixes: e41c81d0d30e ("mm/truncate: Replace page_mapped() call in invalidate_inode_page()")
Signed-off-by: Boris Burkov <boris@xxxxxx>
---
mm/truncate.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/mm/truncate.c b/mm/truncate.c
index 4d61fbdd4b2f..c710c84710b4 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -267,9 +267,17 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
return 0;
if (folio_test_dirty(folio) || folio_test_writeback(folio))
return 0;
- /* The refcount will be elevated if any page in the folio is mapped */
+ /*
+ * The refcount will be elevated if any page in the folio is mapped.
+ *
+ * The refcounts break down as follows:
+ * 1 per mapped page
Using "mapped" is confusing -- I would have expected a folio_mapcount()
here.
Did you mean "1 reference from the pagecache to each page"?
... and now I spotted that Willy had the same comment already, good :)
--
Cheers,
David / dhildenb