On 07/04/17 22:19, Michael S. Tsirkin wrote: > On Fri, Apr 07, 2017 at 08:17:44PM +0100, Jean-Philippe Brucker wrote: >> There are a number of advantages in a paravirtualized IOMMU over a full >> emulation. It is portable and could be reused on different architectures. >> It is easier to implement than a full emulation, with less state tracking. >> It might be more efficient in some cases, with less context switches to >> the host and the possibility of in-kernel emulation. > > Thanks, this is very interesting. I am read to read it all, but I really > would like you to expand some more on the motivation for this work. > Productising this would be quite a bit of work. Spending just 6 lines on > motivation seems somewhat disproportionate. In particular, do you have > any specific efficiency measurements or estimates that you can share? The main motivation for this work is to bring IOMMU virtualization to the ARM world. We don't have any at the moment, and a full ARM SMMU virtualization solution would be counter-productive. We would have to do it for SMMUv2, for the completely orthogonal SMMUv3, and for any future version of the architecture. Doing so in userspace might be acceptable, but then for performance reasons people will want in-kernel emulation of every IOMMU variant out there, which is a maintenance and security nightmare. A single generic vIOMMU is preferable because it reduces maintenance cost and attack surface. The transport code is the same as any virtio device, both for userspace and in-kernel implementations. So instead of rewriting everything from scratch (and the lot of bugs that go with it) for each IOMMU variation, we reuse well-tested code for transport and write the emulation layer once and for all. Note that this work applies to any architecture with an IOMMU, not only ARM and their partners'. Introducing an IOMMU specially designed for virtualization allows us to get rid of complex state tracking inherent to full IOMMU emulations. With a full emulation, all guest accesses to page table and configuration structures have to be trapped and interpreted. A Virtio interface provides well-defined semantics and doesn't need to guess what the guest is trying to do. It transmits requests made from guest device drivers to host IOMMU almost unaltered, removing the intermediate layer of arch-specific configuration structures and page tables. Using a portable standard like Virtio also allows for efficient IOMMU virtualization when guest and host are built for different architectures (for instance when using Qemu TCG.) In-kernel emulation would still work with vhost-iommu, but a platform-specific vIOMMUs would have to stay in userspace. I don't have any measurements at the moment, it is a bit early for that. The kvmtool example was developed on a software model and is mostly here for illustrative purpose, a Qemu implementation would be more suitable for performance analysis. I wouldn't be able to give meaning to these numbers anyway, since on ARM we don't have any existing solution to compare it against. One could compare the complexity of handling guest accesses and parsing page tables in Qemu's VT-d emulation with reading a chain of buffers in Virtio, for a very rough estimate. Thanks, Jean-Philippe