Hi Jean, On Wed Oct 25, 2017 at 10:07:53AM +0100, Jean-Philippe Brucker wrote: > On 25/10/17 08:07, Linu Cherian wrote: > > Hi Jean, > > > > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote: > >> Hi Jean, > >> Thanks for your reply. > >> > >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote: > >>> Hi Linu, > >>> > >>> On 24/10/17 07:27, Linu Cherian wrote: > >>>> Hi Jean, > >>>> > >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote: > >>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized > >>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue. > >>>>> Please find the specification, LaTeX sources and pdf, at: > >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5 > >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf > >>>>> > >>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at: > >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf > >>>>> > >>>>> * Add an event virtqueue for the device to report translation faults to > >>>>> the driver. For the moment only unrecoverable faults are available but > >>>>> future versions will extend it. > >>>>> * Simplify PROBE request by removing the ack part, and flattening RESV > >>>>> properties. > >>>>> * Rename "address space" to "domain". The change might seem futile but > >>>>> allows to introduce PASIDs and other features cleanly in the next > >>>>> versions. In the same vein, the few remaining "device" occurrences were > >>>>> replaced by "endpoint", to avoid any confusion with "the device" > >>>>> referring to the virtio device across the document. > >>>>> * Add implementation notes for RESV_MEM properties. > >>>>> * Update ACPI table definition. > >>>>> * Fix typos and clarify a few things. > >>>>> > >>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions > >>>>> I'll focus on optimizations and adding support for hardware acceleration. > >>>>> > >>>>> Existing implementations are simple and can certainly be optimized, even > >>>>> without architectural changes. But the architecture itself can also be > >>>>> improved in a number of ways. Currently it is designed to work well with > >>>>> VFIO. However, having explicit MAP requests is less efficient* than page > >>>>> tables for emulated and PV endpoints, and the current architecture doesn't > >>>>> address this. Binding page tables is an obvious way to improve throughput > >>>>> in that case, but we can explore cleverer (and possibly simpler) ways to > >>>>> do it. > >>>>> > >>>>> So first we'll work on getting the base device and driver merged, then > >>>>> we'll analyze and compare several ideas for improving performance. > >>>>> > >>>>> Thanks, > >>>>> Jean > >>>>> > >>>>> * I have yet to study this behaviour, and would be interested in any > >>>>> prior art on the subject of analyzing devices DMA patterns (virtio and > >>>>> others) > >>>> > >>>> > >>>> From the spec, > >>>> Under future extensions. > >>>> > >>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU" > >>>> > >>>> Had few questions on this. > >>>> > >>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here. > >>> > >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on, > >>> and adding requests in pretty much the same format to virtio-iommu. > >>> > >>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel > >>>> driver need to create stage 1 page table as required by hardware which is not the case now. > >>>> CMIIW. > >>> > >>> The virtio-iommu device advertises which PASID/page table format is > >>> supported by the host (obtained via sysfs and communicated in the PROBE > >>> request), then the guest binds page tables or PASID tables to a domain and > >>> populates it. Binding page tables alone is easy because we already have > >>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code > >>> in the host to manage PASID tables. But since the PASID table pointer is > >>> translated by stage-2, it would requires a little more work in the host > >>> for obtaining GPA buffers from the guest on demand. > >> Is this for resolving PCI PRI requests ?. > >> IIUC, PCI PRI requests for devices owned by guest need to be resolved > >> by guest itself. > > Supporting PCI PRI is a separate problem, that will be implemented by > extending the event queue proposed in v0.5. Once the guest bound the PASID > table and created the page tables, it will start some DMA job in the > device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page > fault) to its driver, which is relayed to userspace by VFIO, then to the > guest via virtio-iommu. The guest handles the fault, then sends a PRI > response on the virtio-iommu request queue, relayed to the pIOMMU driver > via VFIO and the device retries the access. > > >> In addition the BIND > >>> ioctl is different from the one used by VT-d, so this solution didn't get > >>> much appreciation. > >> > >> Could you please share the links on this ? > > Please find the latest discussion at > https://www.mail-archive.com/iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx/msg20189.html > > >>> The alternative is to bind PASID tables. > >> > >> Sorry, i didnt get the difference here. > > PASID table is what we call Context Table in SMMU, it's the array > associating a PASID (SSID) to a context descriptor. In the SMMUv3 the > stream table entry (device descriptor) points to a PASID table. Each > context descriptor in the PASID table points to a page directory (pgd). > > So the first solution was for the guest to send a BIND with pasid+pgd, and > let the host deal with the context tables. The second solution is to send > a BIND with a PASID table pointer, and have the guest handle the context > table. > > > Also does this solution intend to cover the page table sharing of non SVM > > cases. For example, if we need to share the IOMMU page table for > > a device used in guest kernel, so that map/unmap gets directly handled by the guest > > and only TLB invalidates happens through a virtio-iommu channel. > > Yes for non-SVM in SMMuv3, you still have a context table but with a > single descriptor, so the interface stays the same. So for non SVM case, guest virtio-iommu driver will program the context descriptor such a way that, ASID is not in shared set(ASET = 1b) and hence Physical IOMMU TLB invalidates would get triggered from software for every viommu_unmap(in guest kernel) through Qemu(using vfio ioctls) ? And for SVM case, ASID would be in shared set and explicit TLB invalidates are not required from software ? But with the second > solution, nested with SMMUv2 isn't supported since it doesn't have context > tables. The second solution was considered simpler to implement, so we'll > first go with this one. > > Thanks, > Jean > > >> It requires to factor the guest > >>> PASID handling code into a library, which is difficult for SMMU. Luckily > >>> I'm still working on adding PASID code for SMMUv3, so extracting it out of > >>> the driver isn't a big overhead. The good thing about this solution is > >>> that it reuses any specification work done for VFIO (and vice versa) and > >>> any host driver change made for vSMMU/VT-d emulations. > >>> > >>> Thanks, > >>> Jean > >> > >> -- > >> Linu cherian > > -- Linu cherian