On Wed, 8 Aug 2018 11:45:43 +0800 Peter Xu <peterx@xxxxxxxxxx> wrote: > On Wed, Aug 08, 2018 at 12:58:32AM +0300, Michael S. Tsirkin wrote: > > At least with VTD, it seems entirely possible to change e.g. a PMD > > atomically to point to a different set of PTEs, then flush. > > That will allow removing memory at high granularity for > > an arbitrary device without mdev or PASID dependency. > > My understanding is that the guest driver should prohibit this kind of > operation (say, modifying PMD). There's currently no need for this sort of operation within the dma api and the iommu api doesn't offer it either. > Actually I don't see how it can > happen in Linux if the kernel drivers always call the IOMMU API since > there are only map/unmap APIs rather than this atomic-modify API. Exactly, the vfio dma mapping api is just an extension of the iommu api and there's only map and unmap. Furthermore, unmap can currently return more than requested if the original mapping made use of superpages in the iommu, so the only way to achieve page level granularity is to make only page size mappings. Otherwise we're talking about new apis across the board. > The thing is that IMHO it's the guest driver's responsibility to make > sure the pages will never be used by the device before it removes the > entry (including modifying the PMD since that actually removes all the > entries on the old PMD). If not, I would see it a guest kernel bug > instead of the bug in the emulation code. This is why there is no atomic modify in the dma api, we have drivers that directly manage the buffers for a device and know when it's in use and when it's not. There's never a need, currently, to replace the iova mapping for a single page within a larger buffer. Maybe the dma api could also find use for it, but it seems more unique to the iommu api that we have a "buffer", which happens to be a contiguous RAM region for the VM, where we do want to change the mapping of a single page. That single page might currently be mapped by a 2MB or 1GB page in the case of Intel, or by an arbitrary page size in the case of AMD. vfio is the driver managing these mappings, but versus the dma api, we don't have any insight to the device behavior, including inflight dma. We can stop all dma for the device, but not without interfering and potentially breaking the behavior of the device. So again, I think this comes down to new iommu driver support and new iommu apis and new vfio apis to enable some sort of atomic update interface, or sacrificing performance and adding bloat by forcing page size mappings. Thanks, Alex