On 24.11.21 00:59, Jason Gunthorpe wrote: > On Tue, Nov 23, 2021 at 11:04:04PM +0100, Vlastimil Babka wrote: >> On 11/23/21 18:00, Jason Gunthorpe wrote: >>> >>>> believe what you say and I trust your experience :) So could as well be >>>> that on such a "special" (or not so special) systems there should be a >>>> way to restrict it to privileged users only. >>> >>> At this point RDMA is about as "special" as people running large >>> ZONE_MOVABLE systems, and the two are going to start colliding >>> heavily. The RDMA VFIO migration driver should be merged soon which >>> makes VMs using this stuff finally practical. >> >> How does that work, I see the word migration, so does it cause pages to > > Sorry I mean what is often called "VM live migration". Typically that > cannot be done if a PCI device is assigned to the VM as suspending and > the migrating a PCI device to another server is complicated. With > forthcoming hardware mlx5 can do this and thus the entire RDMA stack > becomes practically usable and performant within a VM. > >> be migrated out of ZONE_MOVABLE before they are pinned? > > GUP already does this automatically for FOLL_LONGTERM. > >> Similarly for io-uring we could be migrating pages to be pinned so that >> the end up consolidated close together, and prevent pathologic >> situations like in David's reproducer. > > It is an interesting idea to have GUP do some kind of THP preserving > migration. Unfortunately it will only be a band aid AFAIU. I can rewrite my reproducer fairly easily to pin the whole 2M range first, pin a second time only a single page, and then unpin the 2M range, resulting in the very same way to block THP. (I can block some THP less because I always need the possibility to memlock 2M first, though). -- Thanks, David / dhildenb