On 9 Aug 2024, at 9:47, David Hildenbrand wrote: > On 09.08.24 12:31, Dev Jain wrote: >> As already being done in __migrate_folio(), wherein we backoff if the >> folio refcount is wrong, make this check during the unmapping phase, upon >> the failure of which, the original state of the PTEs will be restored and >> the folio lock will be dropped via migrate_folio_undo_src(), any racing >> thread will make progress and migration will be retried. >> >> Signed-off-by: Dev Jain <dev.jain@xxxxxxx> >> --- >> mm/migrate.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index e7296c0fb5d5..477acf996951 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1250,6 +1250,15 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, >> } >> if (!folio_mapped(src)) { >> + /* >> + * Someone may have changed the refcount and maybe sleeping >> + * on the folio lock. In case of refcount mismatch, bail out, >> + * let the system make progress and retry. >> + */ >> + struct address_space *mapping = folio_mapping(src); >> + >> + if (folio_ref_count(src) != folio_expected_refs(mapping, src)) >> + goto out; > > This really seems to be the latest point where we can "easily" back off and unlock the source folio -- in this function :) > > I wonder if we should be smarter in the migrate_pages_batch() loop when we start the actual migrations via migrate_folio_move(): if we detect that a folio has unexpected references *and* it has waiters (PG_waiters), back off then and retry the folio later. If it only has unexpected references, just keep retrying: no waiters -> nobody is waiting for the lock to make progress. > > For example, when migrate_folio_move() fails with -EAGAIN, check if there are waiters (PG_waiter?) and undo+unlock to try again later. > > But I'm not really a migration expert, so only my 2 cents :) Sounds reasonable to me. Also add Ying (the author of batch page migration). -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature