Re: Device is ineligible for IOMMU domain attach due to platform RMRR requirement

Steven DuChene <steven.duchene@xxxxxx> · Sat, 07 Mar 2015 05:13:46 -0500

Alex:
What would be the result of running an earlier kernel that did not have 
your RMRR patch on a system that was known to have these problems with 
RMRR issues? Would there possibly be some instability when trying to do 
PCI passthrough of these same NVidia devices?

We have a debian install on one of these same systems and it is running 
a 3.14.23-2 kernel and we are seeing some issues with PCI passthrough.
--
Steven DuChene

On 03/06/2015 11:43 PM, Alex Williamson wrote:
On Fri, 2015-03-06 at 22:10 -0500, Steven DuChene wrote:
Alex:
Thanks for your quick reply and the information. One question though:
When you say contact the platform vendor, are you talking about the
vendor of the GPU card (NVidia) or the vendor of the system hardware
(HP)? I.E. is the problem in the system BIOS/firmware or in the firmware
of the GPU card?

This seems like this is going to be the death-knell of PCI passthrough
as the likelihood of getting a system vendor to fix some obscure thing
like this seems remote.
Hi Steven,

The problem is in the system firmware; the platform vendor in your case
is HP.  The issue is actually very limited.  Most platform vendors do
not make use of RMRRs beyond the recommendations of the VT-d spec.  This
limits RMRRs in the general case to a small set of devices that are not
generally used for PCI assignment anyway.  An exemption even exists for
RMRRs associated with USB devices since their usage is known to be
limited to early boot.  That effectively limits the scope for most
vendors to UMA graphics where PCI assignment does not yet work anyway.
I expect an exemption could also be added there once the RMRR usage is
discovered and documented.

In the case you've encountered, the RMRR usage is proprietary and we
cannot know the extent of ongoing usage.  We must therefore assume that
it is in use and that the RMRR requirement of the platform must be
honored.

Obviously our goal with this change is not to pick on any specific
vendor, but to restrict PCI assignment where it can be implemented
safely, both for the platform and the VM.  RMRRs present a restriction
in how the IOVA space for a device can be used that we cannot continue
to ignore and which presents implementation issues to support in a PCI
device assignment model.  HP engineers as well as the upstream community
have been consulted on this change and agreed to the restriction.  As I
said, KVM is not the first hypervisor to implement this restriction and
PCI assignment continues to be a valuable feature on those hypervisors.
Even on affected systems, RMRRs typically only apply to physical PCI
devices.  The vast majority of PCI assignment applications are used with
networking devices where SR-IOV is far more prevalent and where SR-IOV
virtual functions are typically unencumbered by RMRRs.

I believe this change is in the best interest of PCI assignment users,
the scope of affected systems is not as widespread as it might seem from
your perspective, and workarounds are often available for the most
common use case in the form of SR-IOV VFs.  Unfortunately we don't have
SR-IOV for Nvidia Tesla cards, so again, all I can offer is to contact
the platform vendor to see if there's any chance of a firmware update
that might remove this restriction.  Thanks,

Alex

On 03/06/2015 01:10 AM, Alex Williamson wrote:
On Fri, 2015-03-06 at 00:20 -0500, Steven DuChene wrote:
I am attempting on ubuntu 14.04 to configure PCI passthrough of a NVidia
K40 GPU card that is plugged into a HP DL580 rack mounted server.
I have done all of the pre-work I normally have done in the past with
pci-stub, vfio and etc but when I try an execute a qemu-system-x86_64
command that works on a similar version of debian, I get the following
error in the dmesg:

Device is ineligible for IOMMU domain attach due to platform RMRR
requirement. Contact your platform vendor.

I have read through the patch description from Alex at:

http://lists.linuxfoundation.org/pipermail/iommu/2014-June/008816.html

and I have read the IOMMU documentation at:

https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt

but I am still not really understanding if or what the fix is for this.

The ubuntu 14.04 system where I am getting this error is running
3.16.0-30-generic
The debian system where I can do similar PCI passthrough of a NVidia K2
GPU device is running a 3.14.29-4 kernel.

Can anyone provide any insight into an fix or workaround for this?
Hi Steven,

The issue is that VT-d RMRRs are a platform imposed requirement that a
device continue to have identity mapped access to a platform defined
memory region at all times.  This requirement is fundamentally
incompatible with PCI device assignment where the address space of the
assigned device is defined by the VM.  The VT-d specification hints at
this restriction (8.4):

          The RMRR regions are expected to be used for legacy usages (such
          as USB, UMA Graphics, etc.) requiring reserved memory. Platform
          designers should avoid or limit use of reserved memory regions
          since these require system software to create holes in the DMA
          virtual address range available to system software and its
          drivers.

In order to support assignment of such devices and continue to honor the
RMRR, reserved memory regions would need to be imposed on the guest.
Doing this has a number of issues and it's not clear that it enables any
usable configurations due to the lack of isolation often implied by the
RMRRs.  RMRRs themselves imply some sort of communication conduit to the
platform, which it's also not clear should be allowed for a guest owned
device.

We also cannot continue the previous behavior of simply ignoring RMRRs
for assigned devices.  Not only does the platform require us to honor
them, failing to do so could have implication for both the platform and
the VM health and integrity.

As indicated by the dmesg warning, users encountering this problem
should contact their platform vendor, which is really the only course of
action that I can recommend.  Only the platform vendor can tell you why
they've imposed this requirement for the device and potentially offer a
remedy to remove that requirement.  KVM is not the first hypervisor to
impose this restriction for such devices.  The referenced patch was
tagged for stable, so you can expect that this change will eventually
trickle through all the distributions.  Sorry for the trouble, but it
really was a necessary change.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html