RE: [RFC] virtio-iommu version 0.4

Bharat Bhushan <bharat.bhushan@xxxxxxx> · Wed, 13 Sep 2017 03:47:50 +0000

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx]
> Sent: Tuesday, September 12, 2017 10:43 PM
> To: Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx>;
> iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx; virtio-dev@xxxxxxxxxxxxxxxxxxxx
> Cc: will.deacon@xxxxxxx; robin.murphy@xxxxxxx;
> lorenzo.pieralisi@xxxxxxx; mst@xxxxxxxxxx; jasowang@xxxxxxxxxx;
> marc.zyngier@xxxxxxx; eric.auger.pro@xxxxxxxxx; Bharat Bhushan
> <bharat.bhushan@xxxxxxx>; peterx@xxxxxxxxxx; kevin.tian@xxxxxxxxx
> Subject: Re: [RFC] virtio-iommu version 0.4
> 
> Hi jean,
> 
> On 04/08/2017 20:19, Jean-Philippe Brucker wrote:
> > This is the continuation of my proposal for virtio-iommu, the para-
> > virtualized IOMMU. Here is a summary of the changes since last time [1]:
> >
> > * The virtio-iommu document now resembles an actual specification. It is
> >   split into a formal description of the virtio device, and implementation
> >   notes. Please find sources and binaries at [2].
> >
> > * Added a probe request to describe to the guest different properties that
> >   do not fit in firmware or in the virtio config space. This is a
> >   necessary stepping stone for extending the virtio-iommu.
> >
> > * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
> >   Bhushan.
> >
> > You can find the Linux driver and kvmtool device at [4] and [5]. I
> > plan to rework driver and kvmtool device slightly before sending the
> > patches.
> >
> > To understand the virtio-iommu, I advise to first read introduction
> > and motivation, then skim through implementation notes and finally
> > look at the device specification.
> >
> > I wasn't sure how to organize the review. For those who prefer to
> > comment inline, I attached v0.4 of device-operations.tex and
> > topology.tex+MSI.tex to this thread. They are the biggest chunks of
> > the document. But LaTeX isn't very pleasant to read, so you can simply
> > send a list of comments in relation to section numbers and a few words of
> context, we'll manage.
> >
> > ---
> > Version numbers 0.1-0.4 are arbitrary. I'm hoping they allow to
> > compare more easily differences since the RFC (see [6]), but haven't
> > been made public so far. This is the first public posting since
> > initial proposal [1], and the following describes all changes.
> >
> > ## v0.1 ##
> >
> > Content is the same as the RFC, but formatted to LaTeX. 'make'
> > generates one PDF and one HTML document.
> >
> > ## v0.2 ##
> >
> > Add introductions, improve topology example and firmware description
> > based on feedback and a number of useful discussions.
> >
> > ## v0.3 ##
> >
> > Add normative sections (MUST, SHOULD, etc). Clarify some things,
> > tighten the device and driver behaviour. Unmap semantics are
> > consolidated; they are now closer to VFIO Type1 v2 semantics.
> >
> > ## v0.4 ##
> >
> > Introduce PROBE requests. They provide per-endpoint information to the
> > driver that couldn't be described otherwise.
> >
> > For the moment, they allow to handle MSIs on x86 virtual platforms
> > (see 3.2). To do that we communicate reserved IOVA regions, that will
> > also be useful for describing regions that cannot be mapped for a
> > given endpoint, for instance addresses that correspond to a PCI bridge
> window.
> >
> > Introducing such a large framework for this tiny feature may seem
> > overkill, but it is needed for future extensions of the virtio-iommu
> > and I believe it really is worth the effort.
> 
> 2.6.7
> - As I am currently integrating v0.4 in QEMU here are some other comments:
> At the moment struct virtio_iommu_req_probe flags is missing in your
> header. As such I understood the ACK protocol was not implemented by the
> driver in your branch.
> - VIRTIO_IOMMU_PROBE_PROPERTY_TYPE_MASK is
> VIRTIO_IOMMU_T_MASK in your header too.
> 2.6.8.2:
> - I am really confused about what the device should report as resv regions
> depending on the PE nature (VFIO or not VFIO)
> 
> In other iommu drivers, the resv regions are populated by the iommu driver
> through its get_resv_regions callback. They are usually composed of an
> iommu specific MSI region (mapped or bypassed) and non IOMMU specific
> (device specific) reserved regions:
> iommu_dma_get_resv_regions(). In the case of virtio-iommu driver, those
> are the guest reserved regions.
> 
> First in the current virtio-iommu driver I don't see the
> iommu_dma_get_resv_regions call. Do you agree that the virtio-iommu
> driver should compute the non IOMMU specific MSI regions. ie. this is not
> the responsability of the virtio-iommu device.
> 
> Then why is it more the job of the device to return the guest iommu specific
> region rather than the driver itself?
> 
> Then I understand this is the responsability of the virtio-iommu device to
> gather information about the host resv regions in case of VFIO EP.
> Typically the host PCIe host bridge windows cannot be used for IOVA.
> Also the host MSI reserved IOVA window cannot be used. Do you agree.
> 
> I really think the spec should clarify what exact resv regions the device should
> return in case of VFIO device and non VFIO device.

My understanding is that reserved regions are 1) Host reserved regions ( This is applicable for VFIO) 2) Guest MSI reserved region (This is applicable for emulated devices as well)
Does this make sense to have reserved regions per address-space/domain?

Thanks
-Bharat

> 
> Thanks
> 
> Eric
> 
> >
> > ## Future ##
> >
> > Other extensions are in preparation. I won't detail them here because
> > v0.4 already is a lot to digest, but in short, building on top of PROBE:
> >
> > * First, since the IOMMU is paravirtualized, the device can expose some
> >   properties of the physical topology to the guest, and let it allocate
> >   resources more efficiently. For example, when the virtio-iommu manages
> >   both physical and emulated endpoints, with different underlying
> IOMMUs,
> >   we now have a way to describe multiple page and block granularities,
> >   instead of forcing the guest to use the most restricted one for all
> >   endpoints. This will most likely be in v0.5.
> >
> > * Then on top of that, a major improvement will describe hardware
> >   acceleration features available to the guest. There is what I call "Page
> >   Table Handover" (or simply, from the host POV, "Nested"), the ability
> >   for the guest to manipulate its own page tables instead of sending
> >   MAP/UNMAP requests to the host. This, along with IO Page Fault
> >   reporting, will also permit SVM virtualization on different platforms.
> >
> > Thanks,
> > Jean
> >
> > [1] http://www.spinics.net/lists/kvm/msg147990.html
> > [2] git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
> >     http://www.linux-arm.org/git?p=virtio-
> iommu.git;a=blob;f=dist/v0.4/virtio-iommu-v0.4.pdf
> >     I reiterate the disclaimers: don't use this document as a reference,
> >     it's a draft. It's also not an OASIS document yet. It may be riddled
> >     with mistakes. As this is a working draft, it is unstable and I do not
> >     guarantee backward compatibility of future versions.
> > [3] https://lists.gnu.org/archive/html/qemu-arm/2017-08/msg00004.html
> > [4] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4
> >     Warning: UAPI headers have changed! They didn't follow the spec,
> >     please update. (Use branch v0.1, that has the old headers, for the
> >     Qemu prototype [3])
> > [5] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.4
> >     Warning: command-line has changed! Use --viommu vfio[,opts] and
> >     --viommu virtio[,opts] to instantiate a device.
> > [6]
> > http://www.linux-arm.org/git?p=virtio-iommu.git;a=tree;f=dist/diffs
> >