On Fri, Jan 08, 2021 at 08:42:55PM -0400, Jason Gunthorpe wrote: > On Fri, Jan 08, 2021 at 05:43:56PM -0500, Andrea Arcangeli wrote: > > On Fri, Jan 08, 2021 at 02:19:45PM -0400, Jason Gunthorpe wrote: > > > On Fri, Jan 08, 2021 at 12:00:36PM -0500, Andrea Arcangeli wrote: > > > > > The majority cannot be converted to notifiers because they are DMA > > > > > based. Every one of those is an ABI for something, and does not expect > > > > > extra privilege to function. It would be a major breaking change to > > > > > have pin_user_pages require some cap. > > > > > > > > ... what makes them safe is to be transient GUP pin and not long > > > > term. > > > > > > > > Please note the "long term" in the underlined line. > > > > > > Many of them are long term, though only 50 or so have been marked > > > specifically with FOLL_LONGTERM. I don't see how we can make such a > > > major ABI break. > > > > io_uring is one of those indeed and I already flagged it. > > > > This isn't a black and white issue, kernel memory is also pinned but > > it's not in movable pageblocks... How do you tell the VM in GUP to > > migrate memory to a non movable pageblock before pinning it? Because > > that's what it should do to create less breakage. > > There is already a patch series floating about to do exactly that for > FOLL_LONGTERM pins based on the existing code in GUP for CMA migration > > > For example iommu obviously need to be privileged, if your argument > > that it's enough to use the right API to take long term pins > > unconstrained, that's not the case. Pins are pins and prevent moving > > or freeing the memory, their effect is the same and again worse than > > mlock on many levels. > > The ship sailed on this a decade ago, it is completely infeasible to > go back now, it would completely break widely used things like GPU, > RDMA and more. > I am late to this but GPU should not be use as an excuse for GUP. GUP is a broken model and the way GPU use GUP is less broken then RDMA. In GPU driver GUP contract with userspace is that the data the GPU can access is a snapshot of what the process memory was at the time you asked for the GUP. Process can start using different pages right after. There is no constant coherency contract (ie CPU and GPU can be working on different pages). If you want coherency ie always have CPU and GPU work on the same page then you need to use mmu notifier and avoid pinning pages. Anything that does not abide by mmu notifier is broken and can not be fix. Cheers, Jérôme