Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/03/2019 04:56, Leo Yan wrote:
Hi Robin,

On Fri, Mar 15, 2019 at 12:54:10PM +0000, Robin Murphy wrote:
Hi Leo,

Sorry for the delay - I'm on holiday this week, but since I've made the
mistake of glancing at my inbox I should probably save you from wasting any
more time...

Sorry for disturbing you in holiday and appreciate your help.  It's no
rush to reply.

On 2019-03-15 11:03 am, Auger Eric wrote:
Hi Leo,

+ Jean-Philippe

On 3/15/19 10:37 AM, Leo Yan wrote:
Hi Eric, Robin,

On Wed, Mar 13, 2019 at 11:24:25AM +0100, Auger Eric wrote:

[...]

If the NIC supports MSIs they logically are used. This can be easily
checked on host by issuing "cat /proc/interrupts | grep vfio". Can you
check whether the guest received any interrupt? I remember that Robin
said in the past that on Juno, the MSI doorbell was in the PCI host
bridge window and possibly transactions towards the doorbell could not
reach it since considered as peer to peer.

I found back Robin's explanation. It was not related to MSI IOVA being
within the PCI host bridge window but RAM GPA colliding with host PCI
config space?

"MSI doorbells integral to PCIe root complexes (and thus untranslatable)
typically have a programmable address, so could be anywhere. In the more
general category of "special hardware addresses", QEMU's default ARM
guest memory map puts RAM starting at 0x40000000; on the ARM Juno
platform, that happens to be where PCI config space starts; as Juno's
PCIe doesn't support ACS, peer-to-peer or anything clever, if you assign
the PCI bus to a guest (all of it, given the lack of ACS), the root
complex just sees the guest's attempts to DMA to "memory" as the device
attempting to access config space and aborts them."

Below is some following investigation at my side:

Firstly, must admit that I don't understand well for up paragraph; so
based on the description I am wandering if can use INTx mode and if
it's lucky to avoid this hardware pitfall.

The problem above is that during the assignment process, the virtualizer
maps the whole guest RAM though the IOMMU (+ the MSI doorbell on ARM) to
allow the device, programmed in GPA to access the whole guest RAM.
Unfortunately if the device emits a DMA request with 0x40000000 IOVA
address, this IOVA is interpreted by the Juno RC as a transaction
towards the PCIe config space. So this DMA request will not go beyond
the RC, will never reach the IOMMU and will never reach the guest RAM.
So globally the device is not able to reach part of the guest RAM.
That's how I interpret the above statement. Then I don't know the
details of the collision, I don't have access to this HW. I don't know
either if this problem still exists on the r2 HW.

Thanks a lot for rephrasing, Eric :)

The short answer is that if you want PCI passthrough to work on Juno, the
guest memory map has to look like a Juno.

The PCIe root complex uses an internal lookup table to generate appropriate
AXI attributes for outgoing PCIe transactions; unfortunately this has no
notion of 'default' attributes, so addresses *must* match one of the
programmed windows in order to be valid. From memory, EDK2 sets up a 2GB
window covering the lower DRAM bank, an 8GB window covering the upper DRAM
bank, and a 1MB (or thereabouts) window covering the GICv2m region with
Device attributes.

I checked kernel memory blocks info, it gives out below result:

root@debian:~# cat /sys/kernel/debug/memblock/memory
    0: 0x0000000080000000..0x00000000feffffff
    1: 0x0000000880000000..0x00000009ffffffff

So I think the lower 2GB DRAM window is: [0x8000_0000..0xfeff_ffff]
and the high DRAM window is [0x8_8000_0000..0x9_ffff_ffff].

BTW, now I am using uboot rather than UEFI, so not sure if uboot has
programmed memory windows for PCIe.  Could you help give a point for
which registers should be set in UEFI thus I also can check related
configurations in uboot?

U-Boot does the same thing[1] - you can confirm that by whether PCIe works at all on the host ;)

Any PCIe transactions to addresses not within one of
those windows will be aborted by the RC without ever going out to the AXI
side where the SMMU lies (and I think anything matching the config space or
I/O space windows or a region claimed by a BAR will be aborted even earlier
as a peer-to-peer attempt regardless of the AXI Translation Table setup).

You could potentially modify the firmware to change the window
configuration, but the alignment restrictions make it awkward. I've only
ever tested passthrough on Juno using kvmtool, which IIRC already has guest
RAM in an appropriate place (and is trivially easy to hack if not) - I don't
remember if I ever actually tried guest MSI with that.

I did several tries with kvmtool to tweak memory regions but it's no
lucky.  Since the host uses [0x8000_0000..0xfeff_ffff] as the first
valid memory window for PCIe, thus I tried to change all memory/io
regions into this window with below changes but it's no lucky:

diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
index b9d486d..43f78b1 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -7,10 +7,10 @@

  #include "arm-common/gic.h"

-#define ARM_IOPORT_AREA                _AC(0x0000000000000000, UL)
-#define ARM_MMIO_AREA          _AC(0x0000000000010000, UL)
-#define ARM_AXI_AREA           _AC(0x0000000040000000, UL)
-#define ARM_MEMORY_AREA                _AC(0x0000000080000000, UL)
+#define ARM_IOPORT_AREA                _AC(0x0000000080000000, UL)
+#define ARM_MMIO_AREA          _AC(0x0000000080010000, UL)
+#define ARM_AXI_AREA           _AC(0x0000000088000000, UL)
+#define ARM_MEMORY_AREA                _AC(0x0000000090000000, UL)

Anyway, very appreciate for the suggestions; it's sufficent for me to
dig more for memory related information (e.g. PCIe configurations,
IOMMU, etc) and will keep posted if I make any progress.

None of those should need to change (all the MMIO emulation stuff is irrelevant to PCIe DMA anyway) - provided you don't give the guest more than 2GB of RAM, passthrough with legacy INTx ought to work out-of-the-box. For MSIs to get through, you'll further need to change the host kernel to place its software MSI region[2] within any of the host bridge windows as well.

Robin.

[1] http://git.denx.de/?p=u-boot.git;a=blob;f=board/armltd/vexpress64/pcie.c;h=0608a5a88b941cdd362e9f231250a981aebab357;hb=HEAD#l95
[2] MSI_IOVA_BASE in drivers/iommu/arm-smmu.c
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux