On Mon, Oct 5, 2020 at 8:54 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > > On Mon, Oct 5, 2020 at 8:37 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Mon, Oct 05, 2020 at 08:16:33PM +0200, Daniel Vetter wrote: > > > > > > kvm is some similar hack added for P2P DMA, see commit > > > > add6a0cd1c5ba51b201e1361b05a5df817083618. It might be protected by notifiers.. > > > > > > Yeah my thinking is that kvm (and I think also vfio, also seems to > > > have mmu notifier nearby) are ok because of the mmu notiifer. Assuming > > > that one works correctly. > > > > vfio doesn't have a notifier, Alex was looking to add a vfio private > > scheme in the vma->private_data: > > > > https://lore.kernel.org/kvm/159017449210.18853.15037950701494323009.stgit@xxxxxxxxxx/ > > > > Guess it never happened. > > I was mislead by the mmu notifier in drivers/vfio/vfio.c. But looking > closer, that's only used by some drivers, I guess to make sure their > device pagetables are kept in sync with reality. And not to make sure > the vfio pfn view is kept in sync with reality. > > This could get real nasty I think. > > > > > So, the answer really is that s390 and media need fixing, and this API > > > > should go away (or become kvm specific) > > > > > > I'm still not clear how you want fo fix this, since your vma->dma_buf > > > idea is kinda a decade long plan and so just not going to happen: > > > > Well, it doesn't mean we have to change every part of dma_buf to > > participate in this. Just the bits media cares about. Or maybe it is > > some higher level varient on top of dma_buf. > > > > Or don't use dma_buf for this, add a new object that just provides > > refcounts and P2P DMA connection for IO pfn ranges.. > > So good news is, I dug some layers deeper in v4l, and there's only 2 > users which do actually handle pfn and don't immediately convert to a > pages array: > - videbuf-dma-contig.c. Luckily videobuf 1 is deprecated since > forever, so I think we might get away with either just breaking this, > or at least tainting kernels and hiding it behind a nasty Kconfig. > This only uses follow_pfn, which we need to keep anyway for vfio in > the unsafe variant :-/ > - videbuf2-vmalloc.c Digging through history this was added to support > import of v4l buffers from drivers that needed contig memory. And way > back before CMA, that meant carveout memory not backed by struct page > *. That should now all have struct pages and be managed by CMA (since > videbuf2-dma-contig.c just uses dma_alloc_coherent underneath), so I > think we can just switch to pin_user_pages(FOLL_LONGTERM here too). > > iow I think I can outright delete the frame vector stuff. Ok this doesn't work, because dma_mmap always uses a remap_pfn_range, which is a VM_IO | VM_PFNMAP vma and so even if it's cma backed and not a carveout, we can't get the pages. Plus trying to move the cma pages out of cma for FOLL_LONGTERM would be kinda bad when they've been allocated as a contig block by dma_alloc_coherent :-) So this idea of switching over to pup only is going to break zerocopy. I guess I'll need something else for this then. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch