On 16/10/2018 21:31, Auger Eric wrote: > Hi Jean, > > On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote: >> On 16/10/2018 10:25, Auger Eric wrote: >>> Hi Jean, >>> >>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote: >>>> Implement the virtio-iommu driver, following specification v0.8 [1]. >>>> Changes since v2 [2]: >>>> >>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU >>>> would like to phase out the MMIO transport. This produces a complex >>>> topology where the programming interface of the IOMMU could appear >>>> lower than the endpoints that it translates. It's not unheard of (e.g. >>>> AMD IOMMU), and the guest easily copes with this. >>>> >>>> The "Firmware description" section of the specification has been >>>> updated with all combinations of PCI, MMIO and DT, ACPI. >>> >>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in >>> the PCI domain and one needs to leave a RID hole in the iommu-map. It >>> is not obvious to me that this RID always is predictable given the pcie >>> enumeration mechanism. Generally we have a coarse grain mapping of RID >>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need >>> to precisely identify the RID granted to the iommu. On QEMU this may >>> depend on the instantiation order of the virtio-pci device right? >> >> Yes, although it should all happen before you boot the guest, since >> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront >> and use it for virtio-iommu later? Or generate the iommu-map at the same >> time as generating the child node of the PCI RC? > > Even when cold-plugging the PCIe devices through qemu CLI, this depends > on the order of the pcie devices in the list I guess. I need to further > experiment. Please let me know how it goes. I guess the problem will be the same for building IORT tables? You're also going to need a hole in the ID mappings of the PCI root complex node. >>> So >>> this does not look trivial to build this info. Isn't it possible to do >>> this exclusion at kernel level instead? >> >> So in theory VIRTIO_F_IOMMU_PLATFORM already does that: >> >> VIRTIO_F_IOMMU_PLATFORM(33) >> This feature indicates that the device is behind an IOMMU that >> translates bus addresses from the device into physical addresses in >> memory. If this feature bit is set to 0, then the device emits >> physical addresses which are not translated further, even though an >> IOMMU may be present. > > This tells the driver to use the dma api, right? That's how Linux implements the bit, install custom DMA ops when the bit is absent. But it doesn't work for everyone and has caused a lot of debate (https://patchwork.ozlabs.org/cover/946708/) > Effectively this > explicitly says whether the device is supposed to be upfront an IOMMU. Yes. It's quite strange if you consider hotpluggable hardware, since those devices shouldn't get to choose whether they are managed by an IOMMU. For the IOMMU itself, it should be fine >> For better or for worse, the guest has to implement it. If this feature >> bit is unset for virtio-iommu, it does DMA on the physical address >> space, regardless of what the static topology description says. >> >> In practice it doesn't quite work. If your iommu-map describes the IOMMU >> as translating itself, Linux' OF code will wait for the IOMMU to be >> probed before probing the IOMMU. Working around this with hacks is >> possible, but I don't want to introduce more questionable code to OF and >> device tree bindings if there is any other way. > Hum ok. I cannot really comment on this. > > I just wanted to raise this concern about RID identfication. We can always try. Relaxing iommu-map further would be one additional patch to Documentation/devicetree/bindings/pci/pci-iommu.txt, and one to drivers/iommu/of-iommu.c. I'd rather make it a separate RFC. Since we need acks from an OF maintainer and I'd also like Joerg's approval for adding a new driver to the IOMMU tree, I think it's too late for this iteration. I wasn't intending for this to go into 4.20, just have something to discuss at KVM forum next week. Thanks, Jean