On 13.08.24 20:25, Boris Burkov wrote:
The commit e41c81d0d30e ("mm/truncate: Replace page_mapped() call in invalidate_inode_page()") replaced the page_mapped(page) check with a refcount check. However, this refcount check does not work as expected with drop_caches, at least for btrfs's metadata pages. Btrfs has a per-sb metadata inode with cached pages, and when not in active use by btrfs, they have a refcount of 3. One from the initial call to alloc_pages, one (nr_pages == 1) from filemap_add_folio, and one from folio_attach_private. We would expect such pages to get dropped by drop_caches. However, drop_caches calls into mapping_evict_folio via mapping_try_invalidate which gets a reference on the folio with find_lock_entries(). As a result, these pages have a refcount of 4, and fail this check. For what it's worth, such pages do get reclaimed under memory pressure, and if I change this refcount check to `if folio_mapped(folio)` The following script produces such pages and uses drgn to further analyze the state of the folios: https://github.com/boryas/scripts/blob/main/sh/strand-meta/run.sh It should at least outline the basic idea for producing some btrfs metadata via creating inlined-extent files. My proposed fix for the issue is to add one more hardcoded refcount to this check to account for the caller having a refcount on the page. However, I am less familiar with the other caller into mapping_evict_folio in the page fault path, so I am concerned this fix will not work properly there, and would appreciate extra scrutiny there. Fixes: e41c81d0d30e ("mm/truncate: Replace page_mapped() call in invalidate_inode_page()") Signed-off-by: Boris Burkov <boris@xxxxxx> --- mm/truncate.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c index 4d61fbdd4b2f..c710c84710b4 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -267,9 +267,17 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) return 0; if (folio_test_dirty(folio) || folio_test_writeback(folio)) return 0; - /* The refcount will be elevated if any page in the folio is mapped */ + /* + * The refcount will be elevated if any page in the folio is mapped. + * + * The refcounts break down as follows: + * 1 per mapped page
Using "mapped" is confusing -- I would have expected a folio_mapcount() here.
Did you mean "1 reference from the pagecache to each page"?
+ * 1 from folio_attach_private, if private is set + * 1 from allocating the page in the first place + * 1 from the caller + */ if (folio_ref_count(folio) > - folio_nr_pages(folio) + folio_has_private(folio) + 1) + folio_nr_pages(folio) + folio_has_private(folio) + 1 + 1) return 0; if (!filemap_release_folio(folio, 0)) return 0;
-- Cheers, David / dhildenb