On Thu, 14 May 2020 17:25:38 -0400 Peter Xu <peterx@xxxxxxxxxx> wrote: > On Thu, May 14, 2020 at 10:51:46AM -0600, Alex Williamson wrote: > > This is a follow-on series to "vfio-pci: Block user access to disabled > > device MMIO"[1], which extends user access blocking of disabled MMIO > > ranges to include unmapping the ranges from the IOMMU. The first patch > > adds an invalidation callback path, allowing vfio bus drivers to signal > > the IOMMU backend to unmap ranges with vma level granularity. This > > signaling is done both when the MMIO range becomes inaccessible due to > > memory disabling, as well as when a vma is closed, making up for the > > lack of tracking or pinning for non-page backed vmas. The second > > patch adds registration and testing interfaces such that the IOMMU > > backend driver can test whether a given PFNMAP vma is provided by a > > vfio bus driver supporting invalidation. We can then implement more > > restricted semantics to only allow PFNMAP DMA mappings when we have > > such support, which becomes the new default. > > Hi, Alex, > > IIUC we'll directly tearing down the IOMMU page table without telling the > userspace for those PFNMAP pages. I'm thinking whether there be any side > effect on the userspace side when userspace cached these mapping information > somehow. E.g., is there a way for userspace to know this event? > > Currently, QEMU VT-d will maintain all the IOVA mappings for each assigned > device when used with vfio-pci. In this case, QEMU will probably need to > depend some invalidations sent from the guest (either userspace or kernel) > device drivers to invalidate such IOVA mappings after they got removed from the > hardware IOMMU page table underneath. I haven't thought deeper on what would > happen if the vIOMMU has got an inconsistent mapping of the real device. Full disclosure, I haven't tested vIOMMU, there might be issues. Let's puzzle through this. Without a vIOMMU the vfio MemoryListener in QEMU makes use of address_space_memory, which is essentially the vCPU view of memory. When the memory bit of a PCI device is disabled, QEMU correctly removes the MMIO regions of the device from this AddressSpace. When re-enabled, they get re-added. In that case what we're doing here is a little bit redundant, the IOMMU mappings get dropped in response to the memory bit and the subsequent callback from the MemoryListener is effectively a no-op since the range is already unmapped. When the memory bit is re-enabled, the AddressSpace gets updated, the MemoryListener fires and we re-fault the mmap as we're re-adding the IOMMU mapping. When we have a vIOMMU, the address_space_memory behavior should be the same as above; the vIOMMU isn't involved in vCPU to device access. So I think our concern is explicit mappings, much like vfio itself makes. That feels like a gap. When the vma is being closed, I think dropping those mappings out from under the user is probably still the right approach and I think this series would still be useful if we only did that much. I think this would also address Jason's primary concern. It's better to get an IOMMU fault from the user trying to access those mappings than it is to leave them in place. OTOH, if we're dropping mappings in response to disabling the memory bit, we're changing a potential disabled MMIO access fault into an IOMMU fault, where the platform response might very well be fatal in either case. Maybe we need to look at a more temporary invalidation due to the memory enable bit. If we could remap the range to a kernel page we could maybe avoid the IOMMU fault and maybe even have a crude test for whether any data was written to the page while that mapping was in place (ie. simulating more restricted error handling, though more asynchronous than done at the platform level). Let me look into it. Thanks, Alex