On Thu, Feb 25, 2021 at 02:06:46PM -0500, Peter Xu wrote: > Agreed. I saw discussions around on redefining the vm_pgoff namespace, I can't > say I followed that closely either, but yes it definitely makes sense to always > use an unified namespace. Maybe we should even comment it somewhere on how > vm_pgoff is encoded? Yes, it should be described, it is subtle > > Correct. VFIO can map into the IOMMU PFNs it can get a reference > > to. pin_user_pages() works for the majority, special VFIO VMAs cover > > the rest, and everthing else must be blocked for security. > > If we all agree that the current follow_pfn() should only apply to vfio > internal vmas, I want to remvoe follow_pfn(). Internal VMAs can deduce the PFN from the vm_pgoff, they don't need to do follow. > then it seems we can drop it indeed, as long as the crash reported > in 5cbf3264b would fail gracefully at e.g. VFIO_IOMMU_MAP_DMA rather > than triggering a kernel warning somehow. Yes, this will just fail the ioctl because pin_user_pages() failed and the VMA was not VFIO. > However I'm still confused on why it's more secure - the current process to do > VFIO_IOMMU_MAP_DMA should at least has proper permission for everything to be > setup, including the special vma, right? Say, if the process can write to > those memories, then shouldn't we also allow it to grant this write permission > to other devices too? It is a use-after-free. Once the PFN is programmed into the IOMMU it becomes completely divorced from the VMA. Remember there is no pin_user_page here, so the PFN has no reference count. If the owner of the VMA decided to zap it or otherwise then the IOMMU access keeps going - but now the owner thinks the PFN is free'd and nobody is referencing it. Goes bad. Jason