On Wed, Apr 29, 2020 at 01:56:33PM -0600, Alex Williamson wrote: > On Tue, 28 Apr 2020 21:29:03 -0300 > Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > On Tue, Apr 28, 2020 at 02:12:23PM -0600, Alex Williamson wrote: > > > > > > > Maybe I was just getting lucky before this commit. For a > > > > > VM_PFNMAP, vaddr_get_pfn() only needs pin_user_pages_remote() to return > > > > > error and the vma information that we setup in vfio_pci_mmap(). > > > > > > > > I've written on this before, vfio should not be passing pages to the > > > > iommu that it cannot pin eg it should not touch VM_PFNMAP vma's in the > > > > first place. > > > > > > > > It is a use-after-free security issue the way it is.. > > > > > > Where is the user after free? Here I'm trying to map device mmio space > > > through the iommu, which we need to enable p2p when the user owns > > > multiple devices. > > > > Yes, I gathered what the intent was.. > > > > > The device is owned by the user, bound to vfio-pci, and can't be > > > unbound while the user has it open. The iommu mappings are torn > > > down on release. I guess I don't understand the problem. > > > > For PFNMAP VMAs the lifecycle rule is basically that the PFN inside > > the VMA can only be used inside the mmap_sem that read it. Ie you > > cannot take a PFN outside the mmap_sem and continue to use it. > > > > This is because the owner of the VMA owns the lifetime of that PFN, > > and under the write side of the mmap_sem it can zap the PFN, or close > > the VMA. Afterwards the VMA owner knows that there are no active > > reference to the PFN in the system and can reclaim the PFN > > > > ie the PFNMAP has no per-page pin counter. All lifetime revolves around > > the mmap_sem and the vma. > > > > What vfio does is take the PFN out of the mmap_sem and program it into > > the iommu. > > > > So when the VMA owner decides the PFN has no references, it actually > > doesn't: vfio continues to access it beyond its permitted lifetime. > > > > HW like mlx5 and GPUs have BAR pages which have security > > properties. Once the PFN is returned to the driver the security > > context of the PFN can be reset and re-assigned to another > > process. Using VFIO a hostile user space can retain access to the BAR > > page and upon its reassignment access a security context they were not > > permitted to access. > > > > This is why GUP does not return PFNMAP pages and vfio should not carry > > a reference outside the mmap_sem. It breaks all the lifetime rules. > > Thanks for the explanation. I'm inferring that there is no solution to > this, Not a particularly good one unfortunately. I've been wanting to use P2P_DMA pages to solve these kinds of things but they are kind of expensive. I have a copy of some draft patches trying to do this > but why can't we use mmu notifiers to invalidate the iommu on zap or > close? Hum.. I think with the new mmu interval notifiers vfio might be able to manage that without a huge amount of trouble. But the iommu invalidation needs to be synchronous from a mmu notifier callback - is that feasible? But even so, we have all this stuff now for authorizing PCI P2P which this design completely ignores as well. :( > I know that at least QEMU won't consider these sorts of mapping > fatal, so we could possibly change the default and make support for > such mappings opt-in, but I don't know if I'd break DPDK, or > potentially users within QEMU that make use of p2p between devices. I'd heard this was mostly for GPU device assignment? I'd be surprised if DPDK used this.. Jason