On Wed, Oct 17, 2018 at 12:54:28PM +0100, Jean-Philippe Brucker wrote: > On 16/10/2018 21:31, Auger Eric wrote: > > Hi Jean, > > > > On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote: > >> On 16/10/2018 10:25, Auger Eric wrote: > >>> Hi Jean, > >>> > >>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote: > >>>> Implement the virtio-iommu driver, following specification v0.8 [1]. > >>>> Changes since v2 [2]: > >>>> > >>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU > >>>> would like to phase out the MMIO transport. This produces a complex > >>>> topology where the programming interface of the IOMMU could appear > >>>> lower than the endpoints that it translates. It's not unheard of (e.g. > >>>> AMD IOMMU), and the guest easily copes with this. > >>>> > >>>> The "Firmware description" section of the specification has been > >>>> updated with all combinations of PCI, MMIO and DT, ACPI. > >>> > >>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in > >>> the PCI domain and one needs to leave a RID hole in the iommu-map. It > >>> is not obvious to me that this RID always is predictable given the pcie > >>> enumeration mechanism. Generally we have a coarse grain mapping of RID > >>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need > >>> to precisely identify the RID granted to the iommu. On QEMU this may > >>> depend on the instantiation order of the virtio-pci device right? > >> > >> Yes, although it should all happen before you boot the guest, since > >> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront > >> and use it for virtio-iommu later? Or generate the iommu-map at the same > >> time as generating the child node of the PCI RC? > > > > Even when cold-plugging the PCIe devices through qemu CLI, this depends > > on the order of the pcie devices in the list I guess. I need to further > > experiment. > > Please let me know how it goes. I guess the problem will be the same for > building IORT tables? You're also going to need a hole in the ID > mappings of the PCI root complex node. > > >>> So > >>> this does not look trivial to build this info. Isn't it possible to do > >>> this exclusion at kernel level instead? > >> > >> So in theory VIRTIO_F_IOMMU_PLATFORM already does that: > >> > >> VIRTIO_F_IOMMU_PLATFORM(33) > >> This feature indicates that the device is behind an IOMMU that > >> translates bus addresses from the device into physical addresses in > >> memory. If this feature bit is set to 0, then the device emits > >> physical addresses which are not translated further, even though an > >> IOMMU may be present. > > > > This tells the driver to use the dma api, right? > > That's how Linux implements the bit, install custom DMA ops when the bit > is absent. But it doesn't work for everyone and has caused a lot of > debate (https://patchwork.ozlabs.org/cover/946708/) > > > Effectively this > > explicitly says whether the device is supposed to be upfront an IOMMU. > > Yes. It's quite strange if you consider hotpluggable hardware, since > those devices shouldn't get to choose whether they are managed by an > IOMMU. For the IOMMU itself, it should be fine > > >> For better or for worse, the guest has to implement it. If this feature > >> bit is unset for virtio-iommu, it does DMA on the physical address > >> space, regardless of what the static topology description says. > >> > >> In practice it doesn't quite work. If your iommu-map describes the IOMMU > >> as translating itself, Linux' OF code will wait for the IOMMU to be > >> probed before probing the IOMMU. Working around this with hacks is > >> possible, but I don't want to introduce more questionable code to OF and > >> device tree bindings if there is any other way. > > Hum ok. I cannot really comment on this. > > > > I just wanted to raise this concern about RID identfication. > > We can always try. Relaxing iommu-map further would be one additional > patch to Documentation/devicetree/bindings/pci/pci-iommu.txt, and one to > drivers/iommu/of-iommu.c. I'd rather make it a separate RFC. > > Since we need acks from an OF maintainer and I'd also like Joerg's > approval for adding a new driver to the IOMMU tree, I think it's too > late for this iteration. I wasn't intending for this to go into 4.20, > just have something to discuss at KVM forum next week. > > Thanks, > Jean OK then. I'd appreciate it if you mark patches that aren't intended to be merged as RFC in subject line. Thanks! -- MST