On 9 Oct 2023, at 23:36, Huang, Ying wrote: > Zi Yan <zi.yan@xxxxxxxx> writes: > >> From: Zi Yan <ziy@xxxxxxxxxx> >> >> nr_failed was missing the rc value from migrate_pages_batch() and can >> cause a mismatch between migrate_pages() return value and the number of >> not migrated pages, i.e., when the return value of migrate_pages() is 0, >> there are still pages left in the from page list. It will happen when a >> non-PMD THP large folio fails to migrate due to -ENOMEM and is split >> successfully but not all the split pages are not migrated, >> migrate_pages_batch() would return non-zero, but astats.nr_thp_split = 0. >> nr_failed would be 0 and returned to the caller of migrate_pages(), but >> the not migrated pages are left in the from page list without being added >> back to LRU lists. >> >> Fixes: 2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously firstly") >> Signed-off-by: Zi Yan <ziy@xxxxxxxxxx> >> --- >> mm/migrate.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index c602bf6dec97..5348827bd958 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1834,7 +1834,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio, >> return rc; >> } >> stats->nr_thp_failed += astats.nr_thp_split; >> - nr_failed += astats.nr_thp_split; >> + nr_failed += rc + astats.nr_thp_split; >> /* >> * Fall back to migrate all failed folios one by one synchronously. All >> * failed folios except split THPs will be retried, so their failure > > I don't think this is a correct fix. The failed folios will be retried > in the following synchronous migration below. > > To fix the issue, we should track nr_split for all large folios (not > only THP), then use > > nr_failed += astats.nr_split; You are suggesting a new stats "nr_split" in addition to nr_thp_split? And nr_split includes nr_thp_split? -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature