On Thu, Dec 17, 2020 at 01:52:41PM -0500, Pavel Tatashin wrote: > +/* > + * Verify that there are no unpinnable (movable) pages, if so return true. > + * Otherwise an unpinnable pages is found return false, and unpin all pages. > + */ > +static bool check_and_unpin_pages(unsigned long nr_pages, struct page **pages, > + unsigned int gup_flags) > +{ > + unsigned long i, step; > + > + for (i = 0; i < nr_pages; i += step) { > + struct page *head = compound_head(pages[i]); > + > + step = compound_nr(head) - (pages[i] - head); You can't assume that all of a compound head is in the pages array, this assumption would only work inside the page walkers if the page was found in a PMD or something. > + if (gup_flags & FOLL_PIN) { > + unpin_user_pages(pages, nr_pages); So we throw everything away? Why? That isn't how the old algorithm worked > @@ -1654,22 +1664,55 @@ static long __gup_longterm_locked(struct mm_struct *mm, > struct vm_area_struct **vmas, > unsigned int gup_flags) > { > - unsigned long flags = 0; > + int migrate_retry = 0; > + int isolate_retry = 0; > + unsigned int flags; > long rc; > > - if (gup_flags & FOLL_LONGTERM) > - flags = memalloc_pin_save(); > + if (!(gup_flags & FOLL_LONGTERM)) > + return __get_user_pages_locked(mm, start, nr_pages, pages, vmas, > + NULL, gup_flags); > > - rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas, NULL, > - gup_flags); > + /* > + * Without FOLL_WRITE fault handler may return zero page, which can > + * be in a movable zone, and also will fail to isolate during migration, > + * thus the longterm pin will fail. > + */ > + gup_flags &= FOLL_WRITE; Is &= what you mean here? |= right? Seems like we've ended up in a weird place if FOLL_LONGTERM always includes FOLL_WRITE. Putting the zero page in ZONE_MOVABLE seems like a bad idea, no? > + /* > + * Migration may fail, we retry before giving up. Also, because after > + * migration pages[] becomes outdated, we unpin and repin all pages > + * in the range, so pages array is repopulated with new values. > + * Also, because of this we cannot retry migration failures in a loop > + * without pinning/unpinnig pages. > + */ The old algorithm made continuous forward progress and only went back to the first migration point. > + for (; ; ) { while (true)? > + rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas, > + NULL, gup_flags); > + /* Return if error or if all pages are pinnable */ > + if (rc <= 0 || check_and_unpin_pages(rc, pages, gup_flags)) > + break; So we sweep the pages list twice now? > + /* Some pages are not pinnable, migrate them */ > + rc = migrate_movable_pages(rc, pages); > + > + /* > + * If there is an error, and we tried maximum number of times > + * bail out. Notice: we return an error code, and all pages are > + * unpinned > + */ > + if (rc < 0 && migrate_retry++ >= PINNABLE_MIGRATE_MAX) { > + break; > + } else if (rc > 0 && isolate_retry++ >= PINNABLE_ISOLATE_MAX) { > + rc = -EBUSY; I don't like this at all. It shouldn't be so flakey Can you do migration without the LRU? Jason