Hi David, > > On 02.05.24 20:10, John Hubbard wrote: > > On 5/1/24 11:56 PM, David Hildenbrand wrote: > >> On 02.05.24 03:05, Alistair Popple wrote: > >>> Jason Gunthorpe <jgg@xxxxxxxxxx> writes: > > ... > >>>>> This doesn't make sense. IFF a blind retry is all that is needed > >>>>> it should be done in the core functionality. I fear it's not that > >>>>> easy, though. > >>>> > >>>> +1 > >>>> > >>>> This migration retry weirdness is a GUP issue, it needs to be > >>>> solved in the mm not exposed to every pin_user_pages caller. > >>>> > >>>> If it turns out ZONE_MOVEABLE pages can't actually be reliably > >>>> moved then it is pretty broken.. > >>> > >>> I wonder if we should remove the arbitrary retry limit in > >>> migrate_pages() entirely for ZONE_MOVEABLE pages and just loop until > >>> they migrate? By definition there should only be transient > >>> references on these pages so why do we need to limit the number of > >>> retries in the first place? > >> > >> There are some weird things that still needs fixing: vmsplice() is > >> the example that doesn't use FOLL_LONGTERM. > >> > > > > Hi David! > > > > Sorry for the late reply! > > > Do you have any other call sites in mind? It sounds like one way > > forward is to fix each call site... > > We also have udmabuf, that is currently getting fixed [1] similarly to how we > handle GUP. Could you and/or Jason also have a look at the GUP-related > bits? I acked it but the patch set does not seem to make progress. > > https://lkml.kernel.org/r/20240411070157.3318425-1-vivek.kasireddy@xxxxxxxxx I am hoping Andrew would pick this series up for the upcoming 6.10 cycle given that most patches have Acks and I was planning to maintain the udmabuf bits. Thanks, Vivek > > The sad story is: > > (a) vmsplice() is harder to fix (identify all put_page() and replace them by > unpin_user_page()), but will get fixed at some point. > > (b) even !longterm DMA can take double-digit seconds > > (c) other wrong code is likely to exist or to appear again and it's > rather hard to identify+prevent reliably > > IMHO we should expect migration to take a longer time and maybe never > happening in some cases. > > Memory offlining (e.g., echo "offline" > > sys/devices/system/memory/memory0/state) currently tries forever to > migrate pages and can be killed if started from user space using a fatal signal. > If memory offlining happens from kernel context (unplugging DIMM, ACPI > code triggers offlining), we'd much rather want to fail at some point instead > of looping forever, but it hasn't really popped up as a problem so far. > > virtio-mem uses alloc_contig_range() for best-effort allocation and will skip > such temporarily unmovable ranges to try again later. Here, we really don't > want to loop forever in migration code but rather fail earlier and try unplug > of another memory block. > > So as long as page pinning is triggered from user context where the user can > simply abort the process (e.g., kill the process), sleep+retry on > ZONE_MOVABLE + MIGRATE_CMA sounds reasonable. > > > > > This is an unhappy story right now: the pin_user_pages*() APIs are > > significantly worse than before the "migrate pages away automatically" > > upgrade, from a user point of view. Because now, the APIs fail > > intermittently for callers who follow the rules--because > > pin_user_pages() is not fully working yet, basically. > > > > Other ideas, large or small, about how to approach a fix? > > What Jason says makes sense to me: sleep+retry. My only concern is when > pin_user_pages_*() is called from non-killable context where failing at some > point might be more reasonable. But maybe that use case doesn't really > exist? > > -- > Cheers, > > David / dhildenb