Re: Device is ineligible for IOMMU domain attach due to platform RMRR requirement

Alex Williamson <alex.williamson@xxxxxxxxxx> · Fri, 06 Mar 2015 21:43:09 -0700

On Fri, 2015-03-06 at 22:10 -0500, Steven DuChene wrote:
> Alex:
> Thanks for your quick reply and the information. One question though: 
> When you say contact the platform vendor, are you talking about the 
> vendor of the GPU card (NVidia) or the vendor of the system hardware 
> (HP)? I.E. is the problem in the system BIOS/firmware or in the firmware 
> of the GPU card?
> 
> This seems like this is going to be the death-knell of PCI passthrough 
> as the likelihood of getting a system vendor to fix some obscure thing 
> like this seems remote.

Hi Steven,

The problem is in the system firmware; the platform vendor in your case
is HP.  The issue is actually very limited.  Most platform vendors do
not make use of RMRRs beyond the recommendations of the VT-d spec.  This
limits RMRRs in the general case to a small set of devices that are not
generally used for PCI assignment anyway.  An exemption even exists for
RMRRs associated with USB devices since their usage is known to be
limited to early boot.  That effectively limits the scope for most
vendors to UMA graphics where PCI assignment does not yet work anyway.
I expect an exemption could also be added there once the RMRR usage is
discovered and documented.

In the case you've encountered, the RMRR usage is proprietary and we
cannot know the extent of ongoing usage.  We must therefore assume that
it is in use and that the RMRR requirement of the platform must be
honored.

Obviously our goal with this change is not to pick on any specific
vendor, but to restrict PCI assignment where it can be implemented
safely, both for the platform and the VM.  RMRRs present a restriction
in how the IOVA space for a device can be used that we cannot continue
to ignore and which presents implementation issues to support in a PCI
device assignment model.  HP engineers as well as the upstream community
have been consulted on this change and agreed to the restriction.  As I
said, KVM is not the first hypervisor to implement this restriction and
PCI assignment continues to be a valuable feature on those hypervisors.
Even on affected systems, RMRRs typically only apply to physical PCI
devices.  The vast majority of PCI assignment applications are used with
networking devices where SR-IOV is far more prevalent and where SR-IOV
virtual functions are typically unencumbered by RMRRs.

I believe this change is in the best interest of PCI assignment users,
the scope of affected systems is not as widespread as it might seem from
your perspective, and workarounds are often available for the most
common use case in the form of SR-IOV VFs.  Unfortunately we don't have
SR-IOV for Nvidia Tesla cards, so again, all I can offer is to contact
the platform vendor to see if there's any chance of a firmware update
that might remove this restriction.  Thanks,

Alex

> On 03/06/2015 01:10 AM, Alex Williamson wrote:
> > On Fri, 2015-03-06 at 00:20 -0500, Steven DuChene wrote:
> >> I am attempting on ubuntu 14.04 to configure PCI passthrough of a NVidia
> >> K40 GPU card that is plugged into a HP DL580 rack mounted server.
> >> I have done all of the pre-work I normally have done in the past with
> >> pci-stub, vfio and etc but when I try an execute a qemu-system-x86_64
> >> command that works on a similar version of debian, I get the following
> >> error in the dmesg:
> >>
> >> Device is ineligible for IOMMU domain attach due to platform RMRR
> >> requirement. Contact your platform vendor.
> >>
> >> I have read through the patch description from Alex at:
> >>
> >> http://lists.linuxfoundation.org/pipermail/iommu/2014-June/008816.html
> >>
> >> and I have read the IOMMU documentation at:
> >>
> >> https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt
> >>
> >> but I am still not really understanding if or what the fix is for this.
> >>
> >> The ubuntu 14.04 system where I am getting this error is running
> >> 3.16.0-30-generic
> >> The debian system where I can do similar PCI passthrough of a NVidia K2
> >> GPU device is running a 3.14.29-4 kernel.
> >>
> >> Can anyone provide any insight into an fix or workaround for this?
> > Hi Steven,
> >
> > The issue is that VT-d RMRRs are a platform imposed requirement that a
> > device continue to have identity mapped access to a platform defined
> > memory region at all times.  This requirement is fundamentally
> > incompatible with PCI device assignment where the address space of the
> > assigned device is defined by the VM.  The VT-d specification hints at
> > this restriction (8.4):
> >
> >          The RMRR regions are expected to be used for legacy usages (such
> >          as USB, UMA Graphics, etc.) requiring reserved memory. Platform
> >          designers should avoid or limit use of reserved memory regions
> >          since these require system software to create holes in the DMA
> >          virtual address range available to system software and its
> >          drivers.
> >
> > In order to support assignment of such devices and continue to honor the
> > RMRR, reserved memory regions would need to be imposed on the guest.
> > Doing this has a number of issues and it's not clear that it enables any
> > usable configurations due to the lack of isolation often implied by the
> > RMRRs.  RMRRs themselves imply some sort of communication conduit to the
> > platform, which it's also not clear should be allowed for a guest owned
> > device.
> >
> > We also cannot continue the previous behavior of simply ignoring RMRRs
> > for assigned devices.  Not only does the platform require us to honor
> > them, failing to do so could have implication for both the platform and
> > the VM health and integrity.
> >
> > As indicated by the dmesg warning, users encountering this problem
> > should contact their platform vendor, which is really the only course of
> > action that I can recommend.  Only the platform vendor can tell you why
> > they've imposed this requirement for the device and potentially offer a
> > remedy to remove that requirement.  KVM is not the first hypervisor to
> > impose this restriction for such devices.  The referenced patch was
> > tagged for stable, so you can expect that this change will eventually
> > trickle through all the distributions.  Sorry for the trouble, but it
> > really was a necessary change.  Thanks,
> >
> > Alex
> >
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html