> From: Jean-Philippe Brucker > Sent: Saturday, April 8, 2017 3:18 AM > > This is the initial proposal for a paravirtualized IOMMU device using > virtio transport. It contains a description of the device, a Linux driver, > and a toy implementation in kvmtool. With this prototype, you can > translate DMA to guest memory from emulated (virtio), or passed-through > (VFIO) devices. > > In its simplest form, implemented here, the device handles map/unmap > requests from the guest. Future extensions proposed in "RFC 3/3" should > allow to bind page tables to devices. > > There are a number of advantages in a paravirtualized IOMMU over a full > emulation. It is portable and could be reused on different architectures. > It is easier to implement than a full emulation, with less state tracking. > It might be more efficient in some cases, with less context switches to > the host and the possibility of in-kernel emulation. > > When designing it and writing the kvmtool device, I considered two main > scenarios, illustrated below. > > Scenario 1: a hardware device passed through twice via VFIO > > MEM____pIOMMU________PCI device________________________ > HARDWARE > | (2b) \ > ----------|-------------+-------------+------------------\------------- > | : KVM : \ > | : : \ > pIOMMU drv : _______virtio-iommu drv \ KERNEL > | : | : | \ > VFIO : | : VFIO \ > | : | : | \ > | : | : | / > ----------|-------------+--------|----+----------|------------/-------- > | | : | / > | (1c) (1b) | : (1a) | / (2a) > | | : | / > | | : | / USERSPACE > |___virtio-iommu dev___| : net drv___/ > : > --------------------------------------+-------------------------------- > HOST : GUEST > Usually people draw such layers in reverse order, e.g. hw in the bottom then kernel in the middle then user in the top. :-) > (1) a. Guest userspace is running a net driver (e.g. DPDK). It allocates a > buffer with mmap, obtaining virtual address VA. It then send a > VFIO_IOMMU_MAP_DMA request to map VA to an IOVA (possibly > VA=IOVA). > b. The maping request is relayed to the host through virtio > (VIRTIO_IOMMU_T_MAP). > c. The mapping request is relayed to the physical IOMMU through VFIO. > > (2) a. The guest userspace driver can now instruct the device to directly > access the buffer at IOVA > b. IOVA accesses from the device are translated into physical > addresses by the IOMMU. > > Scenario 2: a virtual net device behind a virtual IOMMU. > > MEM__pIOMMU___PCI device HARDWARE > | | > -------|---------|------+-------------+------------------------------- > | | : KVM : > | | : : > pIOMMU drv | : : > \ | : _____________virtio-net drv KERNEL > \_net drv : | : / (1a) > | : | : / > tap : | ________virtio-iommu drv > | : | | : (1b) > -----------------|------+-----|---|---+------------------------------- > | | | : > |_virtio-net_| | : > / (2) | : > / | : USERSPACE > virtio-iommu dev______| : > : > --------------------------------------+------------------------------- > HOST : GUEST > > (1) a. Guest virtio-net driver maps the virtio ring and a buffer > b. The mapping requests are relayed to the host through virtio. > (2) The virtio-net device now needs to access any guest memory via the > IOMMU. > > Physical and virtual IOMMUs are completely dissociated. The net driver is > mapping its own buffers via DMA/IOMMU API, and buffers are copied > between > virtio-net and tap. > > > The description itself seemed too long for a single email, so I split it > into three documents, and will attach Linux and kvmtool patches to this > email. > > 1. Firmware note, > 2. device operations (draft for the virtio specification), > 3. future work/possible improvements. > > Just to be clear on the terms I'm using: > > pIOMMU physical IOMMU, controlling DMA accesses from physical > devices > vIOMMU virtual IOMMU (virtio-iommu), controlling DMA accesses > from > physical and virtual devices to guest memory. maybe clearer to call controlling 'virtual' DMA access since we're essentially doing DMA virtualization here. Otherwise I read it a bit confusing since DMA accesses from physical device should be controlled by pIOMMU. > GVA, GPA, HVA, HPA > Guest/Host Virtual/Physical Address > IOVA I/O Virtual Address, the address accessed by a device doing DMA > through an IOMMU. In the context of a guest OS, IOVA is GVA. This statement is not accurate. For kernel DMA protection, it is per-device standalone address space (definitely nothing to do with GVA). For user DMA protection, user space driver decides how it wants to construct IOVA address space. could be a standalone one, or reuse GVA. In virtualization case it is either GPA (w/o vIOMMU) or guest IOVA (w/ IOMMU and guest creates IOVA space). anyway IOVA concept is clear. possibly just removing the example is still clear. :-) > > Note: kvmtool is GPLv2. Linux patches are GPLv2, except for UAPI > virtio-iommu.h header, which is BSD 3-clause. For the time being, the > specification draft in RFC 2/3 is also BSD 3-clause. > > > This proposal may be involuntarily centered around ARM architectures at > times. Any feedback would be appreciated, especially regarding other > IOMMU > architectures. > thanks for doing this. will definitely look them in detail and feedback. Thanks Kevin