vfio-pci regression on x86_64 and Kernel v5.9-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alex, Hi Cornelia,

yesterday I wanted to test a variant of Matthew's patch for our detached VF
problem on an x86_64 system, to make sure we don't break anything there.
However I seem to have stumbled over a vfio-pci regression in v5.9-rc1
(without the patch), it works fine on 5.8.1. 
I haven't done a bisect yet but will as soon as I get to it.

The problem occurs immediately when attaching or booting a KVM VM with
a vfio-pci pass-through. With virsh I get:

% sudo virsh start ubuntu20.04
[sudo] password for XXXXXX:
error: Failed to start domain ubuntu20.04
error: internal error: qemu unexpectedly closed the monitor: 2020-08-22T08:21:12.663319Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory
2020-08-22T08:21:12.663344Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory
2020-08-22T08:21:12.663360Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory
2020-08-22T08:21:12.667207Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory
2020-08-22T08:21:12.667265Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: vfio 0000:03:10.2: failed to setup container for group 54: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55ceedea1610, 0x0, 0xa0000, 0x7efcc7e00000) = -12 (Cannot allocate memory)

and in dmesg:

[  379.368222] VFIO - User Level meta-driver version: 0.3
[  379.435459] ixgbe 0000:03:00.0 enp3s0: VF Reset msg received from vf 1
[  379.663384] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
[  379.764947] vfio_pin_pages_remote: RLIMIT_MEMLOCK (9663676416) exceeded
[  379.764972] vfio_pin_pages_remote: RLIMIT_MEMLOCK (9663676416) exceeded
[  379.764989] vfio_pin_pages_remote: RLIMIT_MEMLOCK (9663676416) exceeded
[  379.768836] vfio_pin_pages_remote: RLIMIT_MEMLOCK (9663676416) exceeded
[  379.979310] ixgbevf 0000:03:10.2: enabling device (0000 -> 0002)
[  379.979505] ixgbe 0000:03:00.0 enp3s0: VF Reset msg received from vf 1
[  379.992624] ixgbevf 0000:03:10.2: 2e:7a:3e:95:5d:be
[  379.992627] ixgbevf 0000:03:10.2: MAC: 1
[  379.992629] ixgbevf 0000:03:10.2: Intel(R) 82599 Virtual Function
[  379.993594] ixgbevf 0000:03:10.2 enp3s0v1: renamed from eth1
[  380.043490] ixgbevf 0000:03:10.2: NIC Link is Up 1 Gbps
[  380.045081] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0v1: link becomes ready

This does not seem to be device related, I initially tried with
a VF of an Intel 82599 10 Gigabit NIC but also tried other
physical PCI devices. I also initially tried increasing the ulimit
but looking at the code it seems the limit is actually 9663676416 bytes
so that should be plenty.

Simply rebooting into v5.8.1 (official Arch Linux Kernel but that's
pretty much exactly Greg's stable series and I based my config on its config)
fixes the issue and the same setup works perfectly.
In most documentation people only use Intel boxes for pass-through
so I should mention that this is a AMD Ryzen 9 3900X
with Starship/Matisse IOMMU and my Kernel command line contains
"amd_iommu=on iommu=pt".
Does any of this ring a bell for you or do we definitely need
a full bisect or any other information?

Best regards,
Niklas Schnelle



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux