On Thu, 16 Jan 2020 03:15:58 +0000 Mika Penttilä <mika.penttila@xxxxxxxxxxxx> wrote: > On 16.1.2020 4.59, Alex Williamson wrote: > > On Thu, 16 Jan 2020 02:30:52 +0000 > > Mika Penttilä <mika.penttila@xxxxxxxxxxxx> wrote: > > > >> On 15.1.2020 22.06, Alex Williamson wrote: > >>> On Tue, 14 Jan 2020 22:53:03 -0500 > >>> Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > >>> > >>>> vfio_dma_rw will read/write a range of user space memory pointed to by > >>>> IOVA into/from a kernel buffer without pinning the user space memory. > >>>> > >>>> TODO: mark the IOVAs to user space memory dirty if they are written in > >>>> vfio_dma_rw(). > >>>> > >>>> Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > >>>> Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > >>>> --- > >>>> drivers/vfio/vfio.c | 45 +++++++++++++++++++ > >>>> drivers/vfio/vfio_iommu_type1.c | 76 +++++++++++++++++++++++++++++++++ > >>>> include/linux/vfio.h | 5 +++ > >>>> 3 files changed, 126 insertions(+) > >>>> > >>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > >>>> index c8482624ca34..8bd52bc841cf 100644 > >>>> --- a/drivers/vfio/vfio.c > >>>> +++ b/drivers/vfio/vfio.c > >>>> @@ -1961,6 +1961,51 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage) > >>>> } > >>>> EXPORT_SYMBOL(vfio_unpin_pages); > >>>> > >>>> +/* > >>>> + * Read/Write a range of IOVAs pointing to user space memory into/from a kernel > >>>> + * buffer without pinning the user space memory > >>>> + * @dev [in] : device > >>>> + * @iova [in] : base IOVA of a user space buffer > >>>> + * @data [in] : pointer to kernel buffer > >>>> + * @len [in] : kernel buffer length > >>>> + * @write : indicate read or write > >>>> + * Return error code on failure or 0 on success. > >>>> + */ > >>>> +int vfio_dma_rw(struct device *dev, dma_addr_t iova, void *data, > >>>> + size_t len, bool write) > >>>> +{ > >>>> + struct vfio_container *container; > >>>> + struct vfio_group *group; > >>>> + struct vfio_iommu_driver *driver; > >>>> + int ret = 0; > >> Do you know the iova given to vfio_dma_rw() is indeed a gpa and not iova > >> from a iommu mapping? So isn't it you actually assume all the guest is > >> pinned, > >> like from device assignment? > >> > >> Or who and how is the vfio mapping added before the vfio_dma_rw() ? > > vfio only knows about IOVAs, not GPAs. It's possible that IOVAs are > > identity mapped to the GPA space, but a VM with a vIOMMU would quickly > > break any such assumption. Pinning is also not required. This access > > is via the CPU, not the I/O device, so we don't require the memory to > > be pinning and it potentially won't be for a non-IOMMU backed mediated > > device. The intention here is that via the mediation of an mdev > > device, a vendor driver would already know IOVA ranges for the device > > to access via the guest driver programming of the device. Thanks, > > > > Alex > > Thanks Alex... you mean IOVA is in the case of iommu already a > iommu-translated address to a user space VA in VM host space? The user (QEMU in the case of device assignment) performs ioctls to map user VAs to IOVAs for the device. With IOMMU backing the VAs are pinned to get HPA and the IOVA to HPA mappings are programmed into the IOMMU. Thus the device accesses the IOVA to get to the HPA, which is the backing for the VA. In this case we're simply using the IOVA to lookup the VA and access it with the CPU directly. The IOMMU isn't involved, but we're still performing an access as if we were the device doing a DMA. Let me know if that doesn't answer your question. > How does it get to hold on that? What piece of meditation is responsible > for this? It's device specific. The mdev vendor driver is mediating a specific hardware device where user accesses to MMIO on the device configures DMA targets. The mediation needs to trap those accesses in order to pin page and program the real hardware with real physical addresses (be they HPA or host-IOVAs depending on the host IOMMU config) to perform those DMAs. For cases where the CPU might choose to perform some sort of virtual DMA on behalf of the device itself, this interface would be used. Thanks, Alex