On Mon, Nov 9, 2015 at 6:04 PM, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: > On Mon, 2015-11-09 at 16:46 -0800, Andy Lutomirski wrote: >> The problem here is that in some of the problematic cases the virtio >> driver may not even be loaded. If someone runs an L1 guest with an >> IOMMU-bypassing virtio device and assigns it to L2 using vfio, then >> *boom* L1 crashes. (Same if, say, DPDK gets used, I think.) >> >> > >> > The only way out of this while keeping the "platform" stuff would be to >> > also bump some kind of version in the virtio config (or PCI header). I >> > have no other way to differenciate between "this is an old qemu that >> > doesn't do the 'bypass property' yet" from "this is a virtio device >> > that doesn't bypass". >> > >> > Any better idea ? >> >> I'd suggest that, in the absence of the new DT binding, we assume that >> any PCI device with the virtio vendor ID is passthrough on powerpc. I >> can do this in the virtio driver, but if it's in the platform code >> then vfio gets it right too (i.e. fails to load). > > The problem is there isn't *a* virtio vendor ID. It's the RedHat vendor > ID which will be used by more than just virtio, so we need to > specifically list the devices. Really? /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */ static const struct pci_device_id virtio_pci_id_table[] = { { PCI_DEVICE(0x1af4, PCI_ANY_ID) }, { 0 } }; Can we match on that range? > > Additionally, that still means that once we have a virtio device that > actually uses the iommu, powerpc will not work since the "workaround" > above will kick in. I don't know how to solve that problem, though, especially since the vendor of such a device (especially if it's real hardware) might not set any new bit. > > The "in absence of the new DT binding" doesn't make that much sense. > > Those platforms use device-trees defined since the dawn of ages by > actual open firmware implementations, they either have no iommu > representation in there (Macs, the platform code hooks it all up) or > have various properties related to the iommu but no concept of "bypass" > in there. > > We can *add* a new property under some circumstances that indicates a > bypass on a per-device basis, however that doesn't completely solve it: > > - As I said above, what does the absence of that property mean ? An > old qemu that does bypass on all virtio or a new qemu trying to tell > you that the virtio device actually does use the iommu (or some other > environment that isn't qemu) ? > > - On things like macs, the device-tree is generated by openbios, it > would have to have some added logic to try to figure that out, which > means it needs to know *via different means* that some or all virtio > devices bypass the iommu. > > I thus go back to my original statement, it's a LOT easier to handle if > the device itself is self describing, indicating whether it is set to > bypass a host iommu or not. For L1->L2, well, that wouldn't be the > first time qemu/VFIO plays tricks with the passed through device > configuration space... Which leaves the special case of Xen, where even preexisting devices don't bypass the IOMMU. Can we keep this specific to powerpc and sparc? On x86, this problem is basically nonexistent, since the IOMMU is properly self-describing. IOW, I think that on x86 we should assume that all virtio devices honor the IOMMU. > > Note that the above can be solved via some kind of compromise: The > device self describes the ability to honor the iommu, along with the > property (or ACPI table entry) that indicates whether or not it does. > > IE. We could use the revision or ProgIf field of the config space for > example. Or something in virtio config. If it's an "old" device, we > know it always bypass. If it's a new device, we know it only bypasses > if the corresponding property is in. I still would have to sort out the > openbios case for mac among others but it's at least a workable > direction. > > BTW. Don't you have a similar problem on x86 that today qemu claims > that everything honors the iommu in ACPI ? Only on a single experimental configuration, and that can apparently just be fixed going forward without any real problems being caused. --Andy _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization