On Mon, 4 Jul 2022, Bjorn Helgaas wrote: > I cc'd KVM folks in case they have anything to add here because I'm > not a VFIO passthrough expert. > > It sounds like the problem occurs when the VFIO driver claims the GPU. > I assume that happens after boot, when setting up for the virtual > machine? No, this is during boot, long before a VM is launched. As you can kinda see from these lines from early on in the boot process: [ 22.066610] amdgpu 0000:0e:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none [ 25.726469] vfio-pci 0000:0f:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none The vfio-pci driver claims the device like it was a typical GPU driver, but since it isn't, the display output functionality of the card stops because part of the vfio-pci driver's job is to make sure the card is in an unused, preferably pristine-as-possible state for when the VM takes control of it. If we go back earlier in the boot process, you'll see that second line again: [ 9.226635] vfio-pci 0000:0f:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none [ 9.238385] vfio_pci: add [10de:1f06[ffffffff:ffffffff]] class 0x000000/00000000 [ 9.251529] vfio_pci: add [10de:10f9[ffffffff:ffffffff]] class 0x000000/00000000 [ 9.264328] vfio_pci: add [10de:1ada[ffffffff:ffffffff]] class 0x000000/00000000 [ 9.277162] vfio_pci: add [10de:1adb[ffffffff:ffffffff]] class 0x000000/00000000 If that device is the one selected by the arbiter as boot device, then that is the point where display output stops and everything goes to black. > If so, is there a way to avoid the problem at run-time so the admin > doesn't have to decide at boot-time which GPU will be passed through to > a VM? With the way that many people like me run this kind of setup, the passthrough GPU gets reserved at boot-time anyway with the passing of a line like: vfio_pci.ids=10de:1f06,10de:10f9,10de:1ada,10de:1adb on the kernel command-line from the bootloader. Doing a similar reservation for the host GPU with something like 'vgaarb.bootdev=0e:00.0' alongside it should be no big deal to anyone running a setup like this. You can bind/unbind devices to the vfio-pci driver at run-time using sysfs[1], but as far as I can tell, there is no way to change the boot VGA device at run-time. > Is it possible or desirable to pass through GPU A to VM A, then after > VM A exits, pass through GPU B to VM B? Yeah, there are many ways one can run this setup. Some run with a single GPU that gets passed-through and the host is headless. There's probably some with more than two GPUs with multiple VMs each getting their own. The setup I'm running is pretty common: dedicated GPU for the host (doesn't need to be anything special, just needs to handle workstation duties) and a dedicated GPU for a Windows VM for gaming (something quite powerful for those high FPS :-) As you can see, statically assigning the devices ahead of time is okay. The real problem (for me anyway) is there's no way in the UEFI/BIOS to tell the firmware which device should be used for boot. Sometimes it picks the first GPU, sometimes the second. If if picks wrong, I get an unusable system because the VGA arbiter deems the GPU selected by the firmware to be the best choice for boot VGA device. -- Cal Peake [1] /sys/bus/pci/drivers/vfio-pci