Hi Eric, > -----Original Message----- > From: Auger Eric [mailto:eric.auger@xxxxxxxxxx] > Sent: Tuesday, September 12, 2017 10:43 PM > To: Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx>; > iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; > virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx; virtio-dev@xxxxxxxxxxxxxxxxxxxx > Cc: will.deacon@xxxxxxx; robin.murphy@xxxxxxx; > lorenzo.pieralisi@xxxxxxx; mst@xxxxxxxxxx; jasowang@xxxxxxxxxx; > marc.zyngier@xxxxxxx; eric.auger.pro@xxxxxxxxx; Bharat Bhushan > <bharat.bhushan@xxxxxxx>; peterx@xxxxxxxxxx; kevin.tian@xxxxxxxxx > Subject: Re: [RFC] virtio-iommu version 0.4 > > Hi jean, > > On 04/08/2017 20:19, Jean-Philippe Brucker wrote: > > This is the continuation of my proposal for virtio-iommu, the para- > > virtualized IOMMU. Here is a summary of the changes since last time [1]: > > > > * The virtio-iommu document now resembles an actual specification. It is > > split into a formal description of the virtio device, and implementation > > notes. Please find sources and binaries at [2]. > > > > * Added a probe request to describe to the guest different properties that > > do not fit in firmware or in the virtio config space. This is a > > necessary stepping stone for extending the virtio-iommu. > > > > * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat > > Bhushan. > > > > You can find the Linux driver and kvmtool device at [4] and [5]. I > > plan to rework driver and kvmtool device slightly before sending the > > patches. > > > > To understand the virtio-iommu, I advise to first read introduction > > and motivation, then skim through implementation notes and finally > > look at the device specification. > > > > I wasn't sure how to organize the review. For those who prefer to > > comment inline, I attached v0.4 of device-operations.tex and > > topology.tex+MSI.tex to this thread. They are the biggest chunks of > > the document. But LaTeX isn't very pleasant to read, so you can simply > > send a list of comments in relation to section numbers and a few words of > context, we'll manage. > > > > --- > > Version numbers 0.1-0.4 are arbitrary. I'm hoping they allow to > > compare more easily differences since the RFC (see [6]), but haven't > > been made public so far. This is the first public posting since > > initial proposal [1], and the following describes all changes. > > > > ## v0.1 ## > > > > Content is the same as the RFC, but formatted to LaTeX. 'make' > > generates one PDF and one HTML document. > > > > ## v0.2 ## > > > > Add introductions, improve topology example and firmware description > > based on feedback and a number of useful discussions. > > > > ## v0.3 ## > > > > Add normative sections (MUST, SHOULD, etc). Clarify some things, > > tighten the device and driver behaviour. Unmap semantics are > > consolidated; they are now closer to VFIO Type1 v2 semantics. > > > > ## v0.4 ## > > > > Introduce PROBE requests. They provide per-endpoint information to the > > driver that couldn't be described otherwise. > > > > For the moment, they allow to handle MSIs on x86 virtual platforms > > (see 3.2). To do that we communicate reserved IOVA regions, that will > > also be useful for describing regions that cannot be mapped for a > > given endpoint, for instance addresses that correspond to a PCI bridge > window. > > > > Introducing such a large framework for this tiny feature may seem > > overkill, but it is needed for future extensions of the virtio-iommu > > and I believe it really is worth the effort. > > 2.6.7 > - As I am currently integrating v0.4 in QEMU here are some other comments: > At the moment struct virtio_iommu_req_probe flags is missing in your > header. As such I understood the ACK protocol was not implemented by the > driver in your branch. > - VIRTIO_IOMMU_PROBE_PROPERTY_TYPE_MASK is > VIRTIO_IOMMU_T_MASK in your header too. > 2.6.8.2: > - I am really confused about what the device should report as resv regions > depending on the PE nature (VFIO or not VFIO) > > In other iommu drivers, the resv regions are populated by the iommu driver > through its get_resv_regions callback. They are usually composed of an > iommu specific MSI region (mapped or bypassed) and non IOMMU specific > (device specific) reserved regions: > iommu_dma_get_resv_regions(). In the case of virtio-iommu driver, those > are the guest reserved regions. > > First in the current virtio-iommu driver I don't see the > iommu_dma_get_resv_regions call. Do you agree that the virtio-iommu > driver should compute the non IOMMU specific MSI regions. ie. this is not > the responsability of the virtio-iommu device. > > Then why is it more the job of the device to return the guest iommu specific > region rather than the driver itself? > > Then I understand this is the responsability of the virtio-iommu device to > gather information about the host resv regions in case of VFIO EP. > Typically the host PCIe host bridge windows cannot be used for IOVA. > Also the host MSI reserved IOVA window cannot be used. Do you agree. > > I really think the spec should clarify what exact resv regions the device should > return in case of VFIO device and non VFIO device. My understanding is that reserved regions are 1) Host reserved regions ( This is applicable for VFIO) 2) Guest MSI reserved region (This is applicable for emulated devices as well) Does this make sense to have reserved regions per address-space/domain? Thanks -Bharat > > Thanks > > Eric > > > > > ## Future ## > > > > Other extensions are in preparation. I won't detail them here because > > v0.4 already is a lot to digest, but in short, building on top of PROBE: > > > > * First, since the IOMMU is paravirtualized, the device can expose some > > properties of the physical topology to the guest, and let it allocate > > resources more efficiently. For example, when the virtio-iommu manages > > both physical and emulated endpoints, with different underlying > IOMMUs, > > we now have a way to describe multiple page and block granularities, > > instead of forcing the guest to use the most restricted one for all > > endpoints. This will most likely be in v0.5. > > > > * Then on top of that, a major improvement will describe hardware > > acceleration features available to the guest. There is what I call "Page > > Table Handover" (or simply, from the host POV, "Nested"), the ability > > for the guest to manipulate its own page tables instead of sending > > MAP/UNMAP requests to the host. This, along with IO Page Fault > > reporting, will also permit SVM virtualization on different platforms. > > > > Thanks, > > Jean > > > > [1] http://www.spinics.net/lists/kvm/msg147990.html > > [2] git://linux-arm.org/virtio-iommu.git branch viommu/v0.4 > > http://www.linux-arm.org/git?p=virtio- > iommu.git;a=blob;f=dist/v0.4/virtio-iommu-v0.4.pdf > > I reiterate the disclaimers: don't use this document as a reference, > > it's a draft. It's also not an OASIS document yet. It may be riddled > > with mistakes. As this is a working draft, it is unstable and I do not > > guarantee backward compatibility of future versions. > > [3] https://lists.gnu.org/archive/html/qemu-arm/2017-08/msg00004.html > > [4] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4 > > Warning: UAPI headers have changed! They didn't follow the spec, > > please update. (Use branch v0.1, that has the old headers, for the > > Qemu prototype [3]) > > [5] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.4 > > Warning: command-line has changed! Use --viommu vfio[,opts] and > > --viommu virtio[,opts] to instantiate a device. > > [6] > > http://www.linux-arm.org/git?p=virtio-iommu.git;a=tree;f=dist/diffs > >