Re: [PATCH v3 0/7] Add virtio-iommu driver

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Wed, 17 Oct 2018 11:23:50 -0400

On Wed, Oct 17, 2018 at 12:54:28PM +0100, Jean-Philippe Brucker wrote:
> On 16/10/2018 21:31, Auger Eric wrote:
> > Hi Jean,
> > 
> > On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote:
> >> On 16/10/2018 10:25, Auger Eric wrote:
> >>> Hi Jean,
> >>>
> >>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> >>>> Implement the virtio-iommu driver, following specification v0.8 [1].
> >>>> Changes since v2 [2]:
> >>>>
> >>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
> >>>>    would like to phase out the MMIO transport. This produces a complex
> >>>>    topology where the programming interface of the IOMMU could appear
> >>>>    lower than the endpoints that it translates. It's not unheard of (e.g.
> >>>>    AMD IOMMU), and the guest easily copes with this.
> >>>>    
> >>>>    The "Firmware description" section of the specification has been
> >>>>    updated with all combinations of PCI, MMIO and DT, ACPI.
> >>>
> >>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in
> >>> the PCI domain and one needs to leave a RID hole in the iommu-map.  It
> >>> is not obvious to me that this RID always is predictable given the pcie
> >>> enumeration mechanism. Generally we have a coarse grain mapping of RID
> >>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
> >>> to precisely identify the RID granted to the iommu. On QEMU this may
> >>> depend on the instantiation order of the virtio-pci device right?
> >> 
> >> Yes, although it should all happen before you boot the guest, since
> >> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront
> >> and use it for virtio-iommu later? Or generate the iommu-map at the same
> >> time as generating the child node of the PCI RC?
> > 
> > Even when cold-plugging the PCIe devices through qemu CLI, this depends
> > on the order of the pcie devices in the list I guess. I need to further
> > experiment.
> 
> Please let me know how it goes. I guess the problem will be the same for
> building IORT tables? You're also going to need a hole in the ID
> mappings of the PCI root complex node.
> 
> >>> So
> >>> this does not look trivial to build this info. Isn't it possible to do
> >>> this exclusion at kernel level instead?
> >> 
> >> So in theory VIRTIO_F_IOMMU_PLATFORM already does that:
> >> 
> >> VIRTIO_F_IOMMU_PLATFORM(33)
> >>     This feature indicates that the device is behind an IOMMU that
> >>     translates bus addresses from the device into physical addresses in
> >>     memory. If this feature bit is set to 0, then the device emits
> >>     physical addresses which are not translated further, even though an
> >>     IOMMU may be present.
> > 
> > This tells the driver to use the dma api, right? 
> 
> That's how Linux implements the bit, install custom DMA ops when the bit
> is absent. But it doesn't work for everyone and has caused a lot of
> debate (https://patchwork.ozlabs.org/cover/946708/)
> 
> > Effectively this
> > explicitly says whether the device is supposed to be upfront an IOMMU.
> 
> Yes. It's quite strange if you consider hotpluggable hardware, since
> those devices shouldn't get to choose whether they are managed by an
> IOMMU. For the IOMMU itself, it should be fine
> 
> >> For better or for worse, the guest has to implement it. If this feature
> >> bit is unset for virtio-iommu, it does DMA on the physical address
> >> space, regardless of what the static topology description says.
> >> 
> >> In practice it doesn't quite work. If your iommu-map describes the IOMMU
> >> as translating itself, Linux' OF code will wait for the IOMMU to be
> >> probed before probing the IOMMU. Working around this with hacks is
> >> possible, but I don't want to introduce more questionable code to OF and
> >> device tree bindings if there is any other way.
> > Hum ok. I cannot really comment on this.
> > 
> > I just wanted to raise this concern about RID identfication.
> 
> We can always try. Relaxing iommu-map further would be one additional
> patch to Documentation/devicetree/bindings/pci/pci-iommu.txt, and one to
> drivers/iommu/of-iommu.c. I'd rather make it a separate RFC.
> 
> Since we need acks from an OF maintainer and I'd also like Joerg's
> approval for adding a new driver to the IOMMU tree, I think it's too
> late for this iteration. I wasn't intending for this to go into 4.20,
> just have something to discuss at KVM forum next week.
> 
> Thanks,
> Jean

OK then. I'd appreciate it if you mark patches that aren't
intended to be merged as RFC in subject line.
Thanks!

-- 
MST