Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

Jason Gunthorpe <jgg@xxxxxxxx> · Mon, 5 Oct 2020 14:28:54 -0300

On Sun, Oct 04, 2020 at 06:09:29PM +0200, Daniel Vetter wrote:
> On Sun, Oct 4, 2020 at 2:51 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> >
> > On Sat, Oct 03, 2020 at 11:40:22AM +0200, Daniel Vetter wrote:
> >
> > > > That leaves the only interesting places as vb2_dc_get_userptr() and
> > > > vb2_vmalloc_get_userptr() which both completely fail to follow the
> > > > REQUIRED behavior in the function's comment about checking PTEs. It
> > > > just DMA maps them. Badly broken.
> > > >
> > > > Guessing this hackery is for some embedded P2P DMA transfer?
> > >
> > > Yeah, see also the follow_pfn trickery in
> > > videobuf_dma_contig_user_get(), I think this is fully intentional and
> > > userspace abi we can't break :-/
> >
> > We don't need to break uABI, it just needs to work properly in the
> > kernel:
> >
> >   vma = find_vma_intersection()
> >   dma_buf = dma_buf_get_from_vma(vma)
> >   sg = dma_buf_p2p_dma_map(dma_buf)
> >   [.. do dma ..]
> >   dma_buf_unmap(sg)
> >   dma_buf_put(dma_buf)
> >
> > It is as we discussed before, dma buf needs to be discoverable from a
> > VMA, at least for users doing this kind of stuff.
> 
> I'm not a big fan of magic behaviour like this, there's more to
> dma-buf buffer sharing than just "how do I get at the backing
> storage". Thus far we've done everything rather explicitly. Plus with
> exynos and habanalabs converted there's only v4l left over, and that
> has a proper dma-buf import path already.

Well, any VA approach like this has to access some backing refcount
via the VMA. Not really any way to avoid something like that

> > A VM flag doesn't help - we need to introduce some kind of lifetime,
> > and that has to be derived from the VMA. It needs data not just a flag
> 
> I don't want to make it work, I just want to make it fail. Rough idea
> I have in mind is to add a follow_pfn_longterm, for all callers which
> aren't either synchronized through mmap_sem or an mmu_notifier. 

follow_pfn() doesn't work outside the pagetable locks or mmu notifier
protection. Can't be fixed.

We only have a few users:

arch/s390/pci/pci_mmio.c:       ret = follow_pfn(vma, user_addr, pfn);
drivers/media/v4l2-core/videobuf-dma-contig.c:          ret = follow_pfn(vma, user_address, &this_pfn);
drivers/vfio/vfio_iommu_type1.c:        ret = follow_pfn(vma, vaddr, pfn);
drivers/vfio/vfio_iommu_type1.c:                ret = follow_pfn(vma, vaddr, pfn);
mm/frame_vector.c:                      err = follow_pfn(vma, start, &nums[ret]);
virt/kvm/kvm_main.c:    r = follow_pfn(vma, addr, &pfn);
virt/kvm/kvm_main.c:            r = follow_pfn(vma, addr, &pfn);

VFIO is broken like media, but I saw patches fixing the vfio cases
using the VMA and a vfio specific refcount.

media & frame_vector we are talking about here.

kvm is some similar hack added for P2P DMA, see commit
add6a0cd1c5ba51b201e1361b05a5df817083618. It might be protected by notifiers..

s390 looks broken too, needs to hold the page table locks.

So, the answer really is that s390 and media need fixing, and this API
should go away (or become kvm specific)

> If this really breaks anyone's use-case we can add a tainting kernel
> option which re-enables this (we've done something similar for
> phys_addr_t based buffer sharing in fbdev, entirely unfixable since
> the other driver has to just blindly trust that what userspace
> passes around is legit). This here isn't unfixable, but if v4l
> people want to keep it without a big "security hole here" sticker,
> they should do the work, not me :-)

This seems fairly reasonable..

So after frame_vec is purged and we have the one caller in media, move
all this stuff to media and taint the kernel if it goes down the
follow_pfn path

Jason