Re: [RFC 0/3] virtio-iommu: a paravirtualized IOMMU

Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> · Thu, 13 Apr 2017 14:12:59 +0100

On 13/04/17 09:41, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Saturday, April 8, 2017 3:18 AM
>>
>> This is the initial proposal for a paravirtualized IOMMU device using
>> virtio transport. It contains a description of the device, a Linux driver,
>> and a toy implementation in kvmtool. With this prototype, you can
>> translate DMA to guest memory from emulated (virtio), or passed-through
>> (VFIO) devices.
>>
>> In its simplest form, implemented here, the device handles map/unmap
>> requests from the guest. Future extensions proposed in "RFC 3/3" should
>> allow to bind page tables to devices.
>>
>> There are a number of advantages in a paravirtualized IOMMU over a full
>> emulation. It is portable and could be reused on different architectures.
>> It is easier to implement than a full emulation, with less state tracking.
>> It might be more efficient in some cases, with less context switches to
>> the host and the possibility of in-kernel emulation.
>>
>> When designing it and writing the kvmtool device, I considered two main
>> scenarios, illustrated below.
>>
>> Scenario 1: a hardware device passed through twice via VFIO
>>
>>    MEM____pIOMMU________PCI device________________________
>> HARDWARE
>>             |     (2b)                                    \
>>   ----------|-------------+-------------+------------------\-------------
>>             |             :     KVM     :                   \
>>             |             :             :                    \
>>        pIOMMU drv         :         _______virtio-iommu drv   \    KERNEL
>>             |             :        |    :          |           \
>>           VFIO            :        |    :        VFIO           \
>>             |             :        |    :          |             \
>>             |             :        |    :          |             /
>>   ----------|-------------+--------|----+----------|------------/--------
>>             |                      |    :          |           /
>>             | (1c)            (1b) |    :     (1a) |          / (2a)
>>             |                      |    :          |         /
>>             |                      |    :          |        /   USERSPACE
>>             |___virtio-iommu dev___|    :        net drv___/
>>                                         :
>>   --------------------------------------+--------------------------------
>>                  HOST                   :             GUEST
>>
> 
> Usually people draw such layers in reverse order, e.g. hw in the
> bottom then kernel in the middle then user in the top. :-)

Alright, I'll keep that in mind.

>> (1) a. Guest userspace is running a net driver (e.g. DPDK). It allocates a
>>        buffer with mmap, obtaining virtual address VA. It then send a
>>        VFIO_IOMMU_MAP_DMA request to map VA to an IOVA (possibly
>> VA=IOVA).
>>     b. The maping request is relayed to the host through virtio
>>        (VIRTIO_IOMMU_T_MAP).
>>     c. The mapping request is relayed to the physical IOMMU through VFIO.
>>
>> (2) a. The guest userspace driver can now instruct the device to directly
>>        access the buffer at IOVA
>>     b. IOVA accesses from the device are translated into physical
>>        addresses by the IOMMU.
>>
>> Scenario 2: a virtual net device behind a virtual IOMMU.
>>
>>   MEM__pIOMMU___PCI device                                     HARDWARE
>>          |         |
>>   -------|---------|------+-------------+-------------------------------
>>          |         |      :     KVM     :
>>          |         |      :             :
>>     pIOMMU drv     |      :             :
>>              \     |      :      _____________virtio-net drv      KERNEL
>>               \_net drv   :     |       :          / (1a)
>>                    |      :     |       :         /
>>                   tap     :     |    ________virtio-iommu drv
>>                    |      :     |   |   : (1b)
>>   -----------------|------+-----|---|---+-------------------------------
>>                    |            |   |   :
>>                    |_virtio-net_|   |   :
>>                          / (2)      |   :
>>                         /           |   :                      USERSPACE
>>               virtio-iommu dev______|   :
>>                                         :
>>   --------------------------------------+-------------------------------
>>                  HOST                   :             GUEST
>>
>> (1) a. Guest virtio-net driver maps the virtio ring and a buffer
>>     b. The mapping requests are relayed to the host through virtio.
>> (2) The virtio-net device now needs to access any guest memory via the
>>     IOMMU.
>>
>> Physical and virtual IOMMUs are completely dissociated. The net driver is
>> mapping its own buffers via DMA/IOMMU API, and buffers are copied
>> between
>> virtio-net and tap.
>>
>>
>> The description itself seemed too long for a single email, so I split it
>> into three documents, and will attach Linux and kvmtool patches to this
>> email.
>>
>> 	1. Firmware note,
>> 	2. device operations (draft for the virtio specification),
>> 	3. future work/possible improvements.
>>
>> Just to be clear on the terms I'm using:
>>
>> pIOMMU	physical IOMMU, controlling DMA accesses from physical
>> devices
>> vIOMMU	virtual IOMMU (virtio-iommu), controlling DMA accesses
>> from
>> 	physical and virtual devices to guest memory.
> 
> maybe clearer to call controlling 'virtual' DMA access since we're
> essentially doing DMA virtualization here. Otherwise I read it
> a bit confusing since DMA accesses from physical device should
> be controlled by pIOMMU.
> 
>> GVA, GPA, HVA, HPA
>> 	Guest/Host Virtual/Physical Address
>> IOVA	I/O Virtual Address, the address accessed by a device doing DMA
>> 	through an IOMMU. In the context of a guest OS, IOVA is GVA.
> 
> This statement is not accurate. For kernel DMA protection, it is 
> per-device standalone address space (definitely nothing to do 
> with GVA). For user DMA protection, user space driver decides 
> how it wants to construct IOVA address space. could be a 
> standalone one, or reuse GVA. In virtualization case it is either
> GPA (w/o vIOMMU) or guest IOVA (w/ IOMMU and guest creates
> IOVA space).
> 
> anyway IOVA concept is clear. possibly just removing the example
> is still clear. :-)

Ok, I dropped most IOVA references from the RFC to avoid ambiguity anyway.
I'll tidy up my so-called clarifications next time :)

Thanks,
Jean-Philippe

>>
>> Note: kvmtool is GPLv2. Linux patches are GPLv2, except for UAPI
>> virtio-iommu.h header, which is BSD 3-clause. For the time being, the
>> specification draft in RFC 2/3 is also BSD 3-clause.
>>
>>
>> This proposal may be involuntarily centered around ARM architectures at
>> times. Any feedback would be appreciated, especially regarding other
>> IOMMU
>> architectures.
>>
> 
> thanks for doing this. will definitely look them in detail and feedback.
> 
> Thanks
> Kevin
> 
>