Jason Gunthorpe <jgg@xxxxxxxxxx> writes: > On Thu, Jun 23, 2022 at 02:21:39PM -0600, Alex Williamson wrote: > >> check_and_migrate_movable_pages() perpetually returns zero. I believe >> this is because folio_is_pinnable() previously returned true, and now >> returns false. > > Indeed, it is a bug that check_and_migrate_movable_pages() returns > 0 when it didn't do anything. It should return an error code. > > Hum.. Alistair, maybe you should look at this as well, I'm struggling > alot to understand how it is safe to drop the reference on the page > but hold a pointer to it on the movable_page_list - sure it was > isolated - but why does that mean it won't be concurrently unmapped > and freed? folio_isolate_lru() takes a reference on the page so you're safe from it being freed. If it gets unmapped it will be freed when the matching putback_movable_pages() is called. > Anyhow, it looks like the problem is the tortured logic in this > function, what do you think about this: At a glance it seems reasonable, although I fear it might conflict with my changes for device coherent migration. Agree the whole check_and_migrate_movable_pages() logic is pretty tortured though, and I don't think I'm making it better so would be happy to try cleaning it up futher once the device coherent changes are in. > diff --git a/mm/gup.c b/mm/gup.c > index 5512644076246d..2ffcb3f4ff4a7b 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1879,10 +1879,15 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages, > unsigned int gup_flags) > { > unsigned long isolation_error_count = 0, i; > + struct migration_target_control mtc = { > + .nid = NUMA_NO_NODE, > + .gfp_mask = GFP_USER | __GFP_NOWARN, > + }; > struct folio *prev_folio = NULL; > LIST_HEAD(movable_page_list); > bool drain_allow = true; > - int ret = 0; > + int not_migrated; > + int ret; > > for (i = 0; i < nr_pages; i++) { > struct folio *folio = page_folio(pages[i]); > @@ -1919,16 +1924,13 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages, > folio_nr_pages(folio)); > } > > - if (!list_empty(&movable_page_list) || isolation_error_count) > - goto unpin_pages; > - > /* > * If list is empty, and no isolation errors, means that all pages are > - * in the correct zone. > + * in the correct zone, nothing to do. > */ > - return nr_pages; > + if (list_empty(&movable_page_list) && !isolation_error_count) > + return nr_pages; > > -unpin_pages: > if (gup_flags & FOLL_PIN) { > unpin_user_pages(pages, nr_pages); > } else { > @@ -1936,20 +1938,22 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages, > put_page(pages[i]); > } > > - if (!list_empty(&movable_page_list)) { > - struct migration_target_control mtc = { > - .nid = NUMA_NO_NODE, > - .gfp_mask = GFP_USER | __GFP_NOWARN, > - }; > + if (isolation_error_count) { > + ret = -EINVAL; > + goto err_putback; > + } > > - ret = migrate_pages(&movable_page_list, alloc_migration_target, > - NULL, (unsigned long)&mtc, MIGRATE_SYNC, > - MR_LONGTERM_PIN, NULL); > - if (ret > 0) /* number of pages not migrated */ > - ret = -ENOMEM; > + not_migrated = migrate_pages(&movable_page_list, alloc_migration_target, > + NULL, (unsigned long)&mtc, MIGRATE_SYNC, > + MR_LONGTERM_PIN, NULL); > + if (not_migrated > 0) { > + ret = -ENOMEM; > + goto err_putback; > } > + return 0; > > - if (ret && !list_empty(&movable_page_list)) > +err_putback: > + if (!list_empty(&movable_page_list)) > putback_movable_pages(&movable_page_list); > return ret; > } > >> If I generate an errno here, QEMU reports failing on the pc.rom memory >> region at 0xc0000. Thanks, > > Ah, a ROM region that is all zero'd makes some sense why it has gone > unnoticed as a bug. > > Jason