On 2/7/20 9:08 PM, Jason Gunthorpe wrote: > On Fri, Jan 10, 2020 at 07:03:12PM +0000, Joao Martins wrote: >> From: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx> >> >> Unconditionally interpreting vm_pgoff as a PFN is incorrect. >> >> VMAs created by /dev/mem do this, but in general VM_PFNMAP just means >> that the VMA doesn't have an associated struct page and is being managed >> directly by something other than the core mmu. >> >> Use follow_pfn like KVM does to find the PFN. >> >> Signed-off-by: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx> >> drivers/vfio/vfio_iommu_type1.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index 2ada8e6cdb88..1e43581f95ea 100644 >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -362,9 +362,9 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, >> vma = find_vma_intersection(mm, vaddr, vaddr + 1); >> >> if (vma && vma->vm_flags & VM_PFNMAP) { >> - *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; >> - if (is_invalid_reserved_pfn(*pfn)) >> - ret = 0; >> + ret = follow_pfn(vma, vaddr, pfn); >> + if (!ret && !is_invalid_reserved_pfn(*pfn)) >> + ret = -EOPNOTSUPP; >> } > > FWIW this existing code is a huge hack and a security problem. > > I'm not sure how you could be successfully using this path on actual > memory without hitting bad bugs? > ATM I think this codepath is largelly hit at the moment for MMIO (GPU passthrough, or mdev). In the context of this patch, guest memory would be treated similarly meaning the device-dax backing memory wouldn't have a 'struct page' (as introduced in this series). > Fudamentally VFIO can't retain a reference to a page from within a VMA > without some kind of recount/locking/etc to allow the thing that put > the page there to know it is still being used (ie programmed in a > IOMMU) by VFIO. > > Otherwise it creates use-after-free style security problems on the > page. > I take it you're referring to the past problems with long term page pinning + fsdax? Or you had something else in mind, perhaps related to your LSFMM topic? Here the memory can't be used by the kernel (and there's no struct page) except from device-dax managing/tearing/driving the pfn region (which is static and the underlying PFNs won't change throughout device lifetime), and vfio pinning/unpinning the pfns (which are refcounted against multiple map/unmaps); > This code needs to be deleted, not extended :( To some extent it isn't really an extension: the patch was just removing the assumption @vm_pgoff being the 'start pfn' on PFNMAP vmas. This is also similarly done by get_vaddr_frames(). Joao