On Tue, Apr 30, 2024 at 10:10:43PM -0700, Christoph Hellwig wrote: > > + pinned = -ENOMEM; > > + int attempts = 0; > > + /* > > + * pin_user_pages_fast() can return -EAGAIN, due to falling back > > + * to gup-slow and then failing to migrate pages out of > > + * ZONE_MOVABLE due to a transient elevated page refcount. > > + * > > + * One retry is enough to avoid this problem, so far, but let's > > + * use a slightly higher retry count just in case even larger > > + * systems have a longer-lasting transient refcount problem. > > + * > > + */ > > + static const int MAX_ATTEMPTS = 3; > > + > > + while (pinned == -EAGAIN && attempts < MAX_ATTEMPTS) { > > + pinned = pin_user_pages_fast(cur_base, > > + min_t(unsigned long, > > + npages, PAGE_SIZE / > > + sizeof(struct page *)), > > + gup_flags, page_list); > > ret = pinned; > > - goto umem_release; > > + attempts++; > > + > > + if (pinned == -EAGAIN) > > + continue; > > } > > + if (pinned < 0) > > + goto umem_release; > > This doesn't make sense. IFF a blind retry is all that is needed it > should be done in the core functionality. I fear it's not that easy, > though. +1 This migration retry weirdness is a GUP issue, it needs to be solved in the mm not exposed to every pin_user_pages caller. If it turns out ZONE_MOVEABLE pages can't actually be reliably moved then it is pretty broken.. Jason