[question]: BAR allocation failing

Ruben <rubenbryon@xxxxxxxxx> · Thu, 15 Jul 2021 00:32:30 +0300

I am experiencing an issue with virtualizing a machine which contains
8 NVidia A100 80GB cards.
As a bare metal host, the machine behaves as expected, the GPUs are
connected to the host with a PLX chip PEX88096, which connects 2 GPUs
to 16 lanes on the CPU (using the same NVidia HGX Delta baseboard).
When passing through all GPUs and NVLink bridges to a VM, a problem
arises in that the system can only initialize 4-5 of the 8 GPUs.

The dmesg log shows failed attempts for assiging BAR space to the GPUs
that are not getting initialized.

Things that were tried:
Q35-i440fx with and without UEFI
Qemu 5.x, Qemu 6.0
Host Ubuntu 20.04 host with Qemu/libvirt
Now running proxmox 7 on debian 11, host kernel 5.11.22-2, VM kernel 5.4.0-77
VM kernel parameters pci=nocrs pci=realloc=on/off

------------------------------------

lspci -v:
01:00.0 3D controller: NVIDIA Corporation Device 20b2 (rev a1)
        Memory at db000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 2000000000 (64-bit, prefetchable) [size=128G]
        Memory at 1000000000 (64-bit, prefetchable) [size=32M]

02:00.0 3D controller: NVIDIA Corporation Device 20b2 (rev a1)
        Memory at dc000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 4000000000 (64-bit, prefetchable) [size=128G]
        Memory at 6000000000 (64-bit, prefetchable) [size=32M]

...

0c:00.0 3D controller: NVIDIA Corporation Device 20b2 (rev a1)
        Memory at e0000000 (32-bit, non-prefetchable) [size=16M]
        Memory at <ignored> (64-bit, prefetchable)
        Memory at <ignored> (64-bit, prefetchable)

...

------------------------------------

root@a100:~# dmesg | grep 01:00
[    0.674363] pci 0000:01:00.0: [10de:20b2] type 00 class 0x030200
[    0.674884] pci 0000:01:00.0: reg 0x10: [mem 0xff000000-0xffffffff]
[    0.675010] pci 0000:01:00.0: reg 0x14: [mem
0xffffffe000000000-0xffffffffffffffff 64bit pref]
[    0.675129] pci 0000:01:00.0: reg 0x1c: [mem
0xfffffffffe000000-0xffffffffffffffff 64bit pref]
[    0.675416] pci 0000:01:00.0: Max Payload Size set to 128 (was 256, max 256)
[    0.675567] pci 0000:01:00.0: Enabling HDA controller
[    0.676324] pci 0000:01:00.0: PME# supported from D0 D3hot
[    1.377980] pci 0000:01:00.0: can't claim BAR 0 [mem
0xff000000-0xffffffff]: no compatible bridge window
[    1.377983] pci 0000:01:00.0: can't claim BAR 1 [mem
0xffffffe000000000-0xffffffffffffffff 64bit pref]: no compatible
bridge window
[    1.377986] pci 0000:01:00.0: can't claim BAR 3 [mem
0xfffffffffe000000-0xffffffffffffffff 64bit pref]: no compatible
bridge window
[    1.403889] pci 0000:01:00.0: BAR 1: assigned [mem
0x2000000000-0x3fffffffff 64bit pref]
[    1.404120] pci 0000:01:00.0: BAR 3: assigned [mem
0x1000000000-0x1001ffffff 64bit pref]
[    1.404335] pci 0000:01:00.0: BAR 0: assigned [mem 0xcf000000-0xcfffffff]
[    4.214191] nvidia 0000:01:00.0: enabling device (0000 -> 0002)
[   15.185625] [drm] Initialized nvidia-drm 0.0.0 20160202 for
0000:01:00.0 on minor 1

root@a100:~# dmesg | grep 06:00
[    0.724589] pci 0000:06:00.0: [10de:20b2] type 00 class 0x030200
[    0.724975] pci 0000:06:00.0: reg 0x10: [mem 0xff000000-0xffffffff]
[    0.725069] pci 0000:06:00.0: reg 0x14: [mem
0xffffffe000000000-0xffffffffffffffff 64bit pref]
[    0.725146] pci 0000:06:00.0: reg 0x1c: [mem
0xfffffffffe000000-0xffffffffffffffff 64bit pref]
[    0.725343] pci 0000:06:00.0: Max Payload Size set to 128 (was 256, max 256)
[    0.725471] pci 0000:06:00.0: Enabling HDA controller
[    0.726051] pci 0000:06:00.0: PME# supported from D0 D3hot
[    1.378149] pci 0000:06:00.0: can't claim BAR 0 [mem
0xff000000-0xffffffff]: no compatible bridge window
[    1.378151] pci 0000:06:00.0: can't claim BAR 1 [mem
0xffffffe000000000-0xffffffffffffffff 64bit pref]: no compatible
bridge window
[    1.378154] pci 0000:06:00.0: can't claim BAR 3 [mem
0xfffffffffe000000-0xffffffffffffffff 64bit pref]: no compatible
bridge window
[    1.421549] pci 0000:06:00.0: BAR 1: no space for [mem size
0x2000000000 64bit pref]
[    1.421553] pci 0000:06:00.0: BAR 1: trying firmware assignment
[mem 0xffffffe000000000-0xffffffffffffffff 64bit pref]
[    1.421556] pci 0000:06:00.0: BAR 1: [mem
0xffffffe000000000-0xffffffffffffffff 64bit pref] conflicts with PCI
mem [mem 0x00000000-0xffffffffff]
[    1.421559] pci 0000:06:00.0: BAR 1: failed to assign [mem size
0x2000000000 64bit pref]
[    1.421562] pci 0000:06:00.0: BAR 3: no space for [mem size
0x02000000 64bit pref]
[    1.421564] pci 0000:06:00.0: BAR 3: trying firmware assignment
[mem 0xfffffffffe000000-0xffffffffffffffff 64bit pref]
[    1.421567] pci 0000:06:00.0: BAR 3: [mem
0xfffffffffe000000-0xffffffffffffffff 64bit pref] conflicts with PCI
mem [mem 0x00000000-0xffffffffff]
[    1.421570] pci 0000:06:00.0: BAR 3: failed to assign [mem size
0x02000000 64bit pref]
[    1.421573] pci 0000:06:00.0: BAR 0: assigned [mem 0xd4000000-0xd4ffffff]
[   15.013778] nvidia 0000:06:00.0: enabling device (0000 -> 0002)
[   15.191872] [drm] Initialized nvidia-drm 0.0.0 20160202 for
0000:06:00.0 on minor 6
[   26.946648] NVRM: GPU 0000:06:00.0: RmInitAdapter failed! (0x22:0xffff:662)
[   26.948225] NVRM: GPU 0000:06:00.0: rm_init_adapter failed, device
minor number 5
[   26.982183] NVRM: GPU 0000:06:00.0: RmInitAdapter failed! (0x22:0xffff:662)
[   26.983434] NVRM: GPU 0000:06:00.0: rm_init_adapter failed, device
minor number 5

------------------------------------

I have (blindly) messed with parameters like pref64-reserve for the
pcie-root-port but to be frank I have little clue what I'm doing so my
question would be suggestions on what I can try.
This server will not be running an 8 GPU VM in production but I have a
few days left to test before it goes to work. I was hoping to learn
how to overcome this issue in the future.
Please be aware that my knowledge regarding virtualization and the
Linux kernel does not reach far.

Thanks in advance for your time!