David Hildenbrand <david@xxxxxxxxxx> writes: > On 16.10.24 22:22, John Hubbard wrote: >> If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM) >> family of functions, and requests "too many" pages, then the call will >> erroneously leave pages pinned. This is visible in user space as an >> actual memory leak. >> Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) >> calls >> to exhaust memory. >> The root cause of the problem is this sequence, within >> __gup_longterm_locked(): >> __get_user_pages_locked() >> rc = check_and_migrate_movable_pages() >> ...which gets retried in a loop. The loop error handling is >> incomplete, >> clearly due to a somewhat unusual and complicated tri-state error API. >> But anyway, if -ENOMEM, or in fact, any unexpected error is returned >> from check_and_migrate_movable_pages(), then __gup_longterm_locked() >> happily returns the error, while leaving the pages pinned. >> In the failed case, which is an app that requests (via a device >> driver) >> 30720000000 bytes to be pinned, and then exits, I see this: >> $ grep foll /proc/vmstat >> nr_foll_pin_acquired 7502048 >> nr_foll_pin_released 2048 >> And after applying this patch, it returns to balanced pins: >> $ grep foll /proc/vmstat >> nr_foll_pin_acquired 7502048 >> nr_foll_pin_released 7502048 >> Fix this by unpinning the pages that __get_user_pages_locked() has >> pinned, in such error cases. >> Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix >> check_and_migrate_movable_pages() return codes") >> Cc: Alistair Popple <apopple@xxxxxxxxxx> >> Cc: Shigeru Yoshida <syoshida@xxxxxxxxxx> >> Cc: David Hildenbrand <david@xxxxxxxxxx> >> Cc: Jason Gunthorpe <jgg@xxxxxxxxxx> >> Cc: Minchan Kim <minchan@xxxxxxxxxx> >> Cc: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> >> Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx> >> --- >> mm/gup.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> diff --git a/mm/gup.c b/mm/gup.c >> index a82890b46a36..24acf53c8294 100644 >> --- a/mm/gup.c >> +++ b/mm/gup.c >> @@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct mm_struct *mm, >> /* FOLL_LONGTERM implies FOLL_PIN */ >> rc = check_and_migrate_movable_pages(nr_pinned_pages, pages); >> + >> + /* >> + * The __get_user_pages_locked() call happens before we know >> + * that whether it's possible to successfully complete the whole >> + * operation. To compensate for this, if we get an unexpected >> + * error (such as -ENOMEM) then we must unpin everything, before >> + * erroring out. >> + */ >> + if (rc != -EAGAIN && rc != 0) >> + unpin_user_pages(pages, nr_pinned_pages); >> + >> } while (rc == -EAGAIN); > > Wouldn't it be cleaner to simply have here after the loop (possibly > even after the memalloc_pin_restore()) > > if (rc) > unpin_user_pages(pages, nr_pinned_pages); > > But maybe I am missing something. I initially thought the same thing but I'm not sure it is correct. Consider what happens when __get_user_pages_locked() fails earlier in the loop. In this case it will have bailed out of the loop with rc <= 0 but we shouldn't call unpin_user_pages(). >> memalloc_pin_restore(flags); >> return rc ? rc : nr_pinned_pages;