Re: [PATCH v3 0/3] virtio DMA API core stuff

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Wed, 11 Nov 2015 23:30:27 +0100

On Wed, 2015-11-11 at 07:56 -0800, Andy Lutomirski wrote:
> 
> Can you flesh out this trick?
> 
> On x86 IIUC the IOMMU more-or-less defaults to passthrough.  If the
> kernel wants, it can switch it to a non-passthrough mode.  My patches
> cause the virtio driver to do exactly this, except that the host
> implementation doesn't actually exist yet, so the patches will instead
> have no particular effect.

At some level, yes — we're compatible with a 1982 IBM PC and thus the
IOMMU is entirely disabled at boot until the kernel turns it on —
except in TXT mode where we abandon that compatibility.

But no, the virtio driver has *nothing* to do with switching the device
out of passthrough mode. It is either in passthrough mode, or it isn't.

If the VMM *doesn't* expose an IOMMU to the guest, obviously the
devices are in passthrough mode. If the guest kernel doesn't have IOMMU
support enabled, then obviously the devices are in passthrough mode.
And if the ACPI tables exposed to the guest kernel *tell* it that the
virtio devices are not actually behind the IOMMU (which qemu gets
wrong), then it'll be in passthrough mode.

If the IOMMU is exposed, and enabled, and telling the guest kernel that
it *does* cover the virtio devices, then those virtio devices will
*not* be in passthrough mode.

You choosing to use the DMA API in the virtio device drivers instead of
being buggy, has nothing to do with whether it's actually in
passthrough mode or not. Whether it's in passthrough mode or not, using
the DMA API is technically the right thing to do — because it should
either *do* the translation, or return a 1:1 mapped IOVA, as
appropriate.

> On powerpc and sparc, we *already* screwed up.  The host already tells
> the guest that there's an IOMMU and that it's *enabled* because those
> platforms don't have selective IOMMU coverage the way that x86 does.
> So we need to work around it.

No, we need it on x86 too because once we fix the virtio device driver
bug and make it start using the DMA API, then we start to trip up on
the qemu bug where it lies about which devices are covered by the
IOMMU.

Of course, we still have that same qemu bug w.r.t. assigned devices,
which it *also* claims are behind its IOMMU when they're not...

> I think that, if we want fancy virt-friendly IOMMU stuff like you're
> talking about, then the right thing to do is to create a virtio bus
> instead of pretending to be PCI.  That bus could have a virtio IOMMU
> and its own cross-platform enumeration mechanism for devices on the
> bus, and everything would be peachy.

That doesn't really help very much for the x86 case where the problem
is compatibility with *existing* (arguably broken) qemu
implementations.

Having said that, if this were real hardware I'd just be blacklisting
it and saying "Another BIOS with broken DMAR tables --> IOMMU
completely disabled". So perhaps we should just do that.

> I still don't understand what trick.  If we want virtio devices to be
> assignable, then they should be translated through the IOMMU, and the
> DMA API is the right interface for that.

The DMA API is the right interface *regardless* of whether there's
actual translation to be done. The device driver itself should not be
involved in any way with that decision.

When you want to access MMIO, you use ioremap() and writel() instead of
doing random crap for yourself. When you want DMA, you use the DMA API
to get a bus address for your device *even* if you expect there to be
no IOMMU and you expect it to precisely match the physical address. No
excuses.

-- 
dwmw2

Attachment:
smime.p7s

Description: S/MIME cryptographic signature