On Mon, Nov 04, 2024 at 07:29:44PM -0800, John Hubbard wrote: > commit 53ba78de064b ("mm/gup: introduce > check_and_migrate_movable_folios()") created a new constraint on the > pin_user_pages*() API family: a potentially large internal allocation > must now occur, for FOLL_LONGTERM cases. > > A user-visible consequence has now appeared: user space can no longer > pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB > PAGE_SIZE system, when user space tries to (indirectly, via a device > driver that calls pin_user_pages()) pin 2GB, this requires an allocation > of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for > kmalloc(). > > In addition to the directly visible effect described above, there is > also the problem of adding an unnecessary allocation. The **pages array > argument has already been allocated, and there is no need for a > redundant **folios array allocation in this case. > > Fix this by avoiding the new allocation entirely. This is done by > referring to either the original page[i] within **pages, or to the > associated folio. Thanks to David Hildenbrand for suggesting this > approach and for providing the initial implementation (which I've tested > and adjusted slightly) as well. > > Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") > Suggested-by: David Hildenbrand <david@xxxxxxxxxx> > Cc: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx> > Cc: Dave Airlie <airlied@xxxxxxxxxx> > Cc: Gerd Hoffmann <kraxel@xxxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> > Cc: Jason Gunthorpe <jgg@xxxxxxxxxx> > Cc: Peter Xu <peterx@xxxxxxxxxx> > Cc: Arnd Bergmann <arnd@xxxxxxxx> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > Cc: Dongwon Kim <dongwon.kim@xxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: Junxiao Chang <junxiao.chang@xxxxxxxxx> > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Cc: Oscar Salvador <osalvador@xxxxxxx> > Cc: linux-stable@xxxxxxxxxxxxxxx > Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx> Hi John, thanks for doing this. Reviewed-by: Oscar Salvador <osalvador@xxxxxxx> Nit below: > +static int > +migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list, > + struct pages_or_folios *pofs) > { > int ret; > unsigned long i; > > - for (i = 0; i < nr_folios; i++) { > - struct folio *folio = folios[i]; > + for (i = 0; i < pofs->nr_entries; i++) { > + struct folio *folio = pofs_get_folio(pofs, i); > > if (folio_is_device_coherent(folio)) { > /* > @@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_folios( > * convert the pin on the source folio to a normal > * reference. > */ > - folios[i] = NULL; > + pofs_clear_entry(pofs, i); > folio_get(folio); > gup_put_folio(folio, 1, FOLL_PIN); > > @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios( > * calling folio_isolate_lru() which takes a reference so the > * folio won't be freed if it's migrating. > */ > - unpin_folio(folios[i]); > - folios[i] = NULL; > + unpin_folio(pofs_get_folio(pofs, i)); We already retrieved the folio before, cannot we just bypass pofs_get_folio() here? -- Oscar Salvador SUSE Labs