Re: [PATCH v3 3/7] PCI: OF: Allow endpoints to bypass the iommu

Robin Murphy <robin.murphy@xxxxxxx> · Thu, 18 Oct 2018 11:47:18 +0100

On 17/10/18 16:14, Michael S. Tsirkin wrote:
On Mon, Oct 15, 2018 at 08:46:41PM +0100, Jean-philippe Brucker wrote:
[Replying with my personal address because we're having SMTP issues]

On 15/10/2018 11:52, Michael S. Tsirkin wrote:
On Fri, Oct 12, 2018 at 02:41:59PM -0500, Bjorn Helgaas wrote:
s/iommu/IOMMU/ in subject

On Fri, Oct 12, 2018 at 03:59:13PM +0100, Jean-Philippe Brucker wrote:
Using the iommu-map binding, endpoints in a given PCI domain can be
managed by different IOMMUs. Some virtual machines may allow a subset of
endpoints to bypass the IOMMU. In some case the IOMMU itself is presented

s/case/cases/

as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). Currently, when a
PCI root complex has an iommu-map property, the driver requires all
endpoints to be described by the property. Allow the iommu-map property to
have gaps.

I'm not an IOMMU or virtio expert, so it's not obvious to me why it is
safe to allow devices to bypass the IOMMU.  Does this mean a typo in
iommu-map could inadvertently allow devices to bypass it?

Thinking about this comment, I would like to ask: can't the
virtio device indicate the ranges in a portable way?
This would minimize the dependency on dt bindings and ACPI,
enabling support for systems that have neither but do
have virtio e.g. through pci.

I thought about adding a PROBE request for this in virtio-iommu, but it
wouldn't be usable by a Linux guest because of a bootstrapping problem.

Hmm. At some level it seems wrong to design hardware interfaces
around how Linux happens to probe things. That can change at any time
...

This isn't Linux-specific though. In general it's somewhere between 
difficult and impossible to pull in an IOMMU underneath a device after 
at device is active, so if any OS wants to use an IOMMU, it's going to 
want to know up-front that it's there and which devices it translates so 
that it can program said IOMMU appropriately *before* potentially 
starting DMA and/or interrupts from the relevant devices. Linux happens 
to do things in that order (either by firmware-driven probe-deferral or 
just perilous initcall ordering) because it is the only reasonable order 
in which to do them. AFAIK the platforms which don't rely on any 
firmware description of their IOMMU tend to have a fairly static system 
architecture (such that the OS simply makes hard-coded assumptions), so 
it's not necessarily entirely clear how they would cope with 
virtio-iommu either way.

Robin.

Early on, Linux needs a description of device dependencies, to determine
in which order to probe them. If the device dependency was described by
virtio-iommu itself, the guest could for example initialize a NIC,
allocate buffers and start DMA on the physical address space (which aborts
if the IOMMU implementation disallows DMA by default), only to find out
once the virtio-iommu module is loaded that it needs to cancel all DMA and
reconfigure the NIC. With a static description such as iommu-map in DT or
ACPI remapping tables, the guest can defer probing of the NIC until the
IOMMU is initialized.

Thanks,
Jean

Could you point me at the code you refer to here?