On Wed, 2015-11-11 at 07:56 -0800, Andy Lutomirski wrote: > > Can you flesh out this trick? > > On x86 IIUC the IOMMU more-or-less defaults to passthrough. If the > kernel wants, it can switch it to a non-passthrough mode. My patches > cause the virtio driver to do exactly this, except that the host > implementation doesn't actually exist yet, so the patches will instead > have no particular effect. At some level, yes — we're compatible with a 1982 IBM PC and thus the IOMMU is entirely disabled at boot until the kernel turns it on — except in TXT mode where we abandon that compatibility. But no, the virtio driver has *nothing* to do with switching the device out of passthrough mode. It is either in passthrough mode, or it isn't. If the VMM *doesn't* expose an IOMMU to the guest, obviously the devices are in passthrough mode. If the guest kernel doesn't have IOMMU support enabled, then obviously the devices are in passthrough mode. And if the ACPI tables exposed to the guest kernel *tell* it that the virtio devices are not actually behind the IOMMU (which qemu gets wrong), then it'll be in passthrough mode. If the IOMMU is exposed, and enabled, and telling the guest kernel that it *does* cover the virtio devices, then those virtio devices will *not* be in passthrough mode. You choosing to use the DMA API in the virtio device drivers instead of being buggy, has nothing to do with whether it's actually in passthrough mode or not. Whether it's in passthrough mode or not, using the DMA API is technically the right thing to do — because it should either *do* the translation, or return a 1:1 mapped IOVA, as appropriate. > On powerpc and sparc, we *already* screwed up. The host already tells > the guest that there's an IOMMU and that it's *enabled* because those > platforms don't have selective IOMMU coverage the way that x86 does. > So we need to work around it. No, we need it on x86 too because once we fix the virtio device driver bug and make it start using the DMA API, then we start to trip up on the qemu bug where it lies about which devices are covered by the IOMMU. Of course, we still have that same qemu bug w.r.t. assigned devices, which it *also* claims are behind its IOMMU when they're not... > I think that, if we want fancy virt-friendly IOMMU stuff like you're > talking about, then the right thing to do is to create a virtio bus > instead of pretending to be PCI. That bus could have a virtio IOMMU > and its own cross-platform enumeration mechanism for devices on the > bus, and everything would be peachy. That doesn't really help very much for the x86 case where the problem is compatibility with *existing* (arguably broken) qemu implementations. Having said that, if this were real hardware I'd just be blacklisting it and saying "Another BIOS with broken DMAR tables --> IOMMU completely disabled". So perhaps we should just do that. > I still don't understand what trick. If we want virtio devices to be > assignable, then they should be translated through the IOMMU, and the > DMA API is the right interface for that. The DMA API is the right interface *regardless* of whether there's actual translation to be done. The device driver itself should not be involved in any way with that decision. When you want to access MMIO, you use ioremap() and writel() instead of doing random crap for yourself. When you want DMA, you use the DMA API to get a bus address for your device *even* if you expect there to be no IOMMU and you expect it to precisely match the physical address. No excuses. -- dwmw2
Attachment:
smime.p7s
Description: S/MIME cryptographic signature