On Thu, Jul 07, 2022 at 07:50:17PM -0700, Hugh Dickins wrote: > On Wed, 8 Jun 2022, Matthew Wilcox (Oracle) wrote: > > > Now that both callers have a folio, convert this function to > > take a folio & rename it. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > --- > > mm/migrate.c | 19 ++++++++++++------- > > 1 file changed, 12 insertions(+), 7 deletions(-) > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > index 2975f0c4d7cf..2e2f41572066 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -336,13 +336,18 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd) > > } > > #endif > > > > -static int expected_page_refs(struct address_space *mapping, struct page *page) > > +static int folio_expected_refs(struct address_space *mapping, > > + struct folio *folio) > > { > > - int expected_count = 1; > > + int refs = 1; > > + if (!mapping) > > + return refs; > > > > - if (mapping) > > - expected_count += compound_nr(page) + page_has_private(page); > > - return expected_count; > > + refs += folio_nr_pages(folio); > > + if (folio_get_private(folio)) > > + refs++; > > + > > + return refs; > > } > > > > /* > > @@ -359,7 +364,7 @@ int folio_migrate_mapping(struct address_space *mapping, > > XA_STATE(xas, &mapping->i_pages, folio_index(folio)); > > struct zone *oldzone, *newzone; > > int dirty; > > - int expected_count = expected_page_refs(mapping, &folio->page) + extra_count; > > + int expected_count = folio_expected_refs(mapping, folio) + extra_count; > > long nr = folio_nr_pages(folio); > > > > if (!mapping) { > > @@ -669,7 +674,7 @@ static int __buffer_migrate_folio(struct address_space *mapping, > > return migrate_page(mapping, &dst->page, &src->page, mode); > > > > /* Check whether page does not have extra refs before we do more work */ > > - expected_count = expected_page_refs(mapping, &src->page); > > + expected_count = folio_expected_refs(mapping, src); > > if (folio_ref_count(src) != expected_count) > > return -EAGAIN; > > > > -- > > 2.35.1 > > This commit (742e89c9e352d38df1a5825fe40c4de73a5d5f7a in pagecache.git > folio/for-next and recent linux-next) is dangerously wrong, at least > for swapcache, and probably for some others. > > I say "dangerously" because it tells page migration a swapcache page > is safe for migration when it certainly is not. > > The fun that typically ensues is kernel BUG at include/linux/mm.h:750! > put_page_testzero() VM_BUG_ON_PAGE(page_ref_count(page) == 0, page), > if CONFIG_DEBUG_VM=y (bisecting for that is what brought me to this). > But I guess you might get silent data corruption too. > > I assumed at first that you'd changed the rules, and were now expecting > any subsystem that puts a non-zero value into folio->private to raise > its refcount - whereas the old convention (originating with buffer heads) > is that setting PG_private says an extra refcount has been taken, please > call try_to_release_page() to lower it, and maybe that will use data in > page->private to do so; but page->private free for the subsystem owning > the page to use as it wishes, no refcount implication. But that you > had missed updating swapcache. > > So I got working okay with the patch below; but before turning it into > a proper patch, noticed that there were still plenty of other places > applying the test for PG_private: so now think that maybe you set out > with intention as above, realized it wouldn't work, but got distracted > before cleaning up some places you'd already changed. And patch below > now goes in the wrong direction. > > Or maybe you didn't intend any change, but the PG_private test just got > missed in a few places. I don't know, hope you remember, but current > linux-next badly inconsistent. > Over to you, thanks, Ugh. The problem I'm trying to solve is that we're short on page flags. We _seemed_ to have correlation between "page->private != NULL" and "PG_private is set", and so I thought I could make progress towards removing PG_private. But the rule you set out above wasn't written down anywhere that I was able to find it. I'm about to go to sleep, but I'll think on this some more tomorrow. > Hugh > > --- a/mm/migrate.c 2022-07-06 14:24:44.499941975 -0700 > +++ b/mm/migrate.c 2022-07-06 15:49:25.000000000 -0700 > @@ -351,6 +351,10 @@ unlock: > } > #endif > > +static inline bool folio_counted_private(struct folio *folio) > +{ > + return !folio_test_swapcache(folio) && folio_get_private(folio); > +} > static int folio_expected_refs(struct address_space *mapping, > struct folio *folio) > { > @@ -359,7 +363,7 @@ static int folio_expected_refs(struct ad > return refs; > > refs += folio_nr_pages(folio); > - if (folio_get_private(folio)) > + if (folio_counted_private(folio)) > refs++; > > return refs; > --- a/mm/vmscan.c 2022-07-06 14:24:44.531942217 -0700 > +++ b/mm/vmscan.c 2022-07-06 15:49:37.000000000 -0700 > @@ -2494,6 +2494,10 @@ shrink_inactive_list(unsigned long nr_to > * The downside is that we have to touch folio->_refcount against each folio. > * But we had to alter folio->flags anyway. > */ > +static inline bool folio_counted_private(struct folio *folio) > +{ > + return !folio_test_swapcache(folio) && folio_get_private(folio); > +} > static void shrink_active_list(unsigned long nr_to_scan, > struct lruvec *lruvec, > struct scan_control *sc, > @@ -2538,8 +2542,9 @@ static void shrink_active_list(unsigned > } > > if (unlikely(buffer_heads_over_limit)) { > - if (folio_get_private(folio) && folio_trylock(folio)) { > - if (folio_get_private(folio)) > + if (folio_counted_private(folio) && > + folio_trylock(folio)) { > + if (folio_counted_private(folio)) > filemap_release_folio(folio, 0); > folio_unlock(folio); > }