On Tue, Nov 21, 2023 at 06:00:40PM +0530, Charan Teja Kalla wrote: > The below race on a folio between reclaim and migration exposed a bug > of not populating the swap cache with proper folio resulting into the > rcu stalls: Thank you for figuring out this race and describing it so well. It explains a few things I've seen, at least potentially. What would you think to this? I think a better fix would be to fix the swap cache to user multi-order entries, but I would like to see this backportable! diff --git a/mm/migrate.c b/mm/migrate.c index d9d2b9432e81..2d67ca47d2e2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -405,6 +405,7 @@ int folio_migrate_mapping(struct address_space *mapping, int dirty; int expected_count = folio_expected_refs(mapping, folio) + extra_count; long nr = folio_nr_pages(folio); + long entries, i; if (!mapping) { /* Anonymous page without mapping */ @@ -442,8 +443,10 @@ int folio_migrate_mapping(struct address_space *mapping, folio_set_swapcache(newfolio); newfolio->private = folio_get_private(folio); } + entries = nr; } else { VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); + entries = 1; } /* Move dirty while page refs frozen and newpage not yet exposed */ @@ -453,7 +456,11 @@ int folio_migrate_mapping(struct address_space *mapping, folio_set_dirty(newfolio); } - xas_store(&xas, newfolio); + /* Swap cache still stores N entries instead of a high-order entry */ + for (i = 0; i < entries; i++) { + xas_store(&xas, newfolio); + xas_next(&xas); + } /* * Drop cache reference from old page by unfreezing