On 25/10/17 08:07, Linu Cherian wrote: > Hi Jean, > > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote: >> Hi Jean, >> Thanks for your reply. >> >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker wrote: >>> Hi Linu, >>> >>> On 24/10/17 07:27, Linu Cherian wrote: >>>> Hi Jean, >>>> >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe Brucker wrote: >>>>> This is version 0.5 of the virtio-iommu specification, the paravirtualized >>>>> IOMMU. This version addresses feedback from v0.4 and adds an event virtqueue. >>>>> Please find the specification, LaTeX sources and pdf, at: >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5 >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf >>>>> >>>>> A detailed changelog since v0.4 follows. You can find the pdf diff at: >>>>> http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf >>>>> >>>>> * Add an event virtqueue for the device to report translation faults to >>>>> the driver. For the moment only unrecoverable faults are available but >>>>> future versions will extend it. >>>>> * Simplify PROBE request by removing the ack part, and flattening RESV >>>>> properties. >>>>> * Rename "address space" to "domain". The change might seem futile but >>>>> allows to introduce PASIDs and other features cleanly in the next >>>>> versions. In the same vein, the few remaining "device" occurrences were >>>>> replaced by "endpoint", to avoid any confusion with "the device" >>>>> referring to the virtio device across the document. >>>>> * Add implementation notes for RESV_MEM properties. >>>>> * Update ACPI table definition. >>>>> * Fix typos and clarify a few things. >>>>> >>>>> I will publish the Linux driver for v0.5 shortly. Then for next versions >>>>> I'll focus on optimizations and adding support for hardware acceleration. >>>>> >>>>> Existing implementations are simple and can certainly be optimized, even >>>>> without architectural changes. But the architecture itself can also be >>>>> improved in a number of ways. Currently it is designed to work well with >>>>> VFIO. However, having explicit MAP requests is less efficient* than page >>>>> tables for emulated and PV endpoints, and the current architecture doesn't >>>>> address this. Binding page tables is an obvious way to improve throughput >>>>> in that case, but we can explore cleverer (and possibly simpler) ways to >>>>> do it. >>>>> >>>>> So first we'll work on getting the base device and driver merged, then >>>>> we'll analyze and compare several ideas for improving performance. >>>>> >>>>> Thanks, >>>>> Jean >>>>> >>>>> * I have yet to study this behaviour, and would be interested in any >>>>> prior art on the subject of analyzing devices DMA patterns (virtio and >>>>> others) >>>> >>>> >>>> From the spec, >>>> Under future extensions. >>>> >>>> "Page Table Handover, to allow guests to manage their own page tables and share them with the MMU" >>>> >>>> Had few questions on this. >>>> >>>> 1. Did you mean SVM support for vfio-pci devices attached to guest processes here. >>> >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is working on, >>> and adding requests in pretty much the same format to virtio-iommu. >>> >>>> 2. Can you give some hints on how this is going to work , since virtio-iommu guest kernel >>>> driver need to create stage 1 page table as required by hardware which is not the case now. >>>> CMIIW. >>> >>> The virtio-iommu device advertises which PASID/page table format is >>> supported by the host (obtained via sysfs and communicated in the PROBE >>> request), then the guest binds page tables or PASID tables to a domain and >>> populates it. Binding page tables alone is easy because we already have >>> the required drivers in the guest (io-pgtable or arch/* for SVM) and code >>> in the host to manage PASID tables. But since the PASID table pointer is >>> translated by stage-2, it would requires a little more work in the host >>> for obtaining GPA buffers from the guest on demand. >> Is this for resolving PCI PRI requests ?. >> IIUC, PCI PRI requests for devices owned by guest need to be resolved >> by guest itself. Supporting PCI PRI is a separate problem, that will be implemented by extending the event queue proposed in v0.5. Once the guest bound the PASID table and created the page tables, it will start some DMA job in the device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page fault) to its driver, which is relayed to userspace by VFIO, then to the guest via virtio-iommu. The guest handles the fault, then sends a PRI response on the virtio-iommu request queue, relayed to the pIOMMU driver via VFIO and the device retries the access. >> In addition the BIND >>> ioctl is different from the one used by VT-d, so this solution didn't get >>> much appreciation. >> >> Could you please share the links on this ? Please find the latest discussion at https://www.mail-archive.com/iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx/msg20189.html >>> The alternative is to bind PASID tables. >> >> Sorry, i didnt get the difference here. PASID table is what we call Context Table in SMMU, it's the array associating a PASID (SSID) to a context descriptor. In the SMMUv3 the stream table entry (device descriptor) points to a PASID table. Each context descriptor in the PASID table points to a page directory (pgd). So the first solution was for the guest to send a BIND with pasid+pgd, and let the host deal with the context tables. The second solution is to send a BIND with a PASID table pointer, and have the guest handle the context table. > Also does this solution intend to cover the page table sharing of non SVM > cases. For example, if we need to share the IOMMU page table for > a device used in guest kernel, so that map/unmap gets directly handled by the guest > and only TLB invalidates happens through a virtio-iommu channel. Yes for non-SVM in SMMuv3, you still have a context table but with a single descriptor, so the interface stays the same. But with the second solution, nested with SMMUv2 isn't supported since it doesn't have context tables. The second solution was considered simpler to implement, so we'll first go with this one. Thanks, Jean >> It requires to factor the guest >>> PASID handling code into a library, which is difficult for SMMU. Luckily >>> I'm still working on adding PASID code for SMMUv3, so extracting it out of >>> the driver isn't a big overhead. The good thing about this solution is >>> that it reuses any specification work done for VFIO (and vice versa) and >>> any host driver change made for vSMMU/VT-d emulations. >>> >>> Thanks, >>> Jean >> >> -- >> Linu cherian >