Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

Jason Gunthorpe <jgg@xxxxxxxx> · Tue, 22 Nov 2022 14:50:09 -0400

On Tue, Nov 22, 2022 at 07:08:25PM +0100, Daniel Vetter wrote:
> On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> >
> > On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > them like that (like calling get_user_pages works, or that they're
> > > accounting like any other normal memory) cannot be guaranteed.
> > >
> > > Since some userspace only runs on integrated devices, where all
> > > buffers are actually all resident system memory, there's a huge
> > > temptation to assume that a struct page is always present and useable
> > > like for any more pagecache backed mmap. This has the potential to
> > > result in a uapi nightmare.
> > >
> > > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > > blocks get_user_pages and all the other struct page based
> > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > the kernel-internal CONFIG_DMABUF_DEBUG.
> > >
> > > Motivated by a recent patch which wanted to swich the system dma-buf
> > > heap to vm_insert_page instead of vm_insert_pfn.
> > >
> > > v2:
> > >
> > > Jason brought up that we also want to guarantee that all ptes have the
> > > pte_special flag set, to catch fast get_user_pages (on architectures
> > > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> > >
> > > From auditing the various functions to insert pfn pte entires
> > > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > > this should be the correct flag to check for.
> >
> > I didn't look at how this actually gets used, but it is a bit of a
> > pain to insert a lifetime controlled object like a struct page as a
> > special PTE/VM_PFNMAP
> >
> > How is the lifetime model implemented here? How do you know when
> > userspace has finally unmapped the page?
> 
> The vma has a filp which is the refcounted dma_buf. With dma_buf you
> never get an individual page it's always the entire object. And it's
> up to the allocator how exactly it wants to use or not use the page's
> refcount. So if gup goes in and elevates the refcount, you can break
> stuff, which is why I'm doing this.

But how does move work?

Jason