CCing Miaohe Lin On Tue, Aug 13, 2024 at 11:25:56AM GMT, Boris Burkov wrote: > The commit e41c81d0d30e ("mm/truncate: Replace page_mapped() call in > invalidate_inode_page()") replaced the page_mapped(page) check with a > refcount check. However, this refcount check does not work as expected > with drop_caches, at least for btrfs's metadata pages. > > Btrfs has a per-sb metadata inode with cached pages, and when not in > active use by btrfs, they have a refcount of 3. One from the initial > call to alloc_pages, one (nr_pages == 1) from filemap_add_folio, and one > from folio_attach_private. We would expect such pages to get dropped by > drop_caches. However, drop_caches calls into mapping_evict_folio via > mapping_try_invalidate which gets a reference on the folio with > find_lock_entries(). As a result, these pages have a refcount of 4, and > fail this check. > > For what it's worth, such pages do get reclaimed under memory pressure, > and if I change this refcount check to `if folio_mapped(folio)` > > The following script produces such pages and uses drgn to further > analyze the state of the folios: > https://github.com/boryas/scripts/blob/main/sh/strand-meta/run.sh > It should at least outline the basic idea for producing some btrfs > metadata via creating inlined-extent files. > > My proposed fix for the issue is to add one more hardcoded refcount to > this check to account for the caller having a refcount on the page. > However, I am less familiar with the other caller into > mapping_evict_folio in the page fault path, so I am concerned this fix > will not work properly there, and would appreciate extra scrutiny there. > > Fixes: e41c81d0d30e ("mm/truncate: Replace page_mapped() call in invalidate_inode_page()") > Signed-off-by: Boris Burkov <boris@xxxxxx> > --- > mm/truncate.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/mm/truncate.c b/mm/truncate.c > index 4d61fbdd4b2f..c710c84710b4 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -267,9 +267,17 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) > return 0; > if (folio_test_dirty(folio) || folio_test_writeback(folio)) > return 0; > - /* The refcount will be elevated if any page in the folio is mapped */ > + /* > + * The refcount will be elevated if any page in the folio is mapped. > + * > + * The refcounts break down as follows: > + * 1 per mapped page > + * 1 from folio_attach_private, if private is set > + * 1 from allocating the page in the first place > + * 1 from the caller > + */ I think the above explanation is correct at least from my code inspection. Most of the callers are related to memory failure. I would reword the "1 per mapped page" to "1 per page in page cache" or something as mapped here might mean mapped in page tables. > if (folio_ref_count(folio) > > - folio_nr_pages(folio) + folio_has_private(folio) + 1) > + folio_nr_pages(folio) + folio_has_private(folio) + 1 + 1) > return 0; > if (!filemap_release_folio(folio, 0)) > return 0; > -- > 2.46.0 > >