Re: vfio-pci regression on x86_64 and Kernel v5.9-rc1

Alex Williamson <alex.williamson@xxxxxxxxxx> · Mon, 24 Aug 2020 08:30:56 -0600

On Sat, 22 Aug 2020 10:46:49 +0200
Niklas Schnelle <schnelle@xxxxxxxxxxxxx> wrote:

> Hi Alex, Hi Cornelia,
> 
> yesterday I wanted to test a variant of Matthew's patch for our detached VF
> problem on an x86_64 system, to make sure we don't break anything there.
> However I seem to have stumbled over a vfio-pci regression in v5.9-rc1
> (without the patch), it works fine on 5.8.1. 
> I haven't done a bisect yet but will as soon as I get to it.
> 
> The problem occurs immediately when attaching or booting a KVM VM with
> a vfio-pci pass-through. With virsh I get:
> 
> % sudo virsh start ubuntu20.04
> [sudo] password for XXXXXX:
> error: Failed to start domain ubuntu20.04
> error: internal error: qemu unexpectedly closed the monitor: 2020-08-22T08:21:12.663319Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory
> 2020-08-22T08:21:12.663344Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory
> 2020-08-22T08:21:12.663360Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory
> 2020-08-22T08:21:12.667207Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory
> 2020-08-22T08:21:12.667265Z qemu-system-x86_64: -device vfio-pci,host=0000:03:10.2,id=hostdev0,bus=pci.6,addr=0x0: vfio 0000:03:10.2: failed to setup container for group 54: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55ceedea1610, 0x0, 0xa0000, 0x7efcc7e00000) = -12 (Cannot allocate memory)
> 
> and in dmesg:
> 
> [  379.368222] VFIO - User Level meta-driver version: 0.3
> [  379.435459] ixgbe 0000:03:00.0 enp3s0: VF Reset msg received from vf 1
> [  379.663384] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
> [  379.764947] vfio_pin_pages_remote: RLIMIT_MEMLOCK (9663676416) exceeded
> [  379.764972] vfio_pin_pages_remote: RLIMIT_MEMLOCK (9663676416) exceeded
> [  379.764989] vfio_pin_pages_remote: RLIMIT_MEMLOCK (9663676416) exceeded
> [  379.768836] vfio_pin_pages_remote: RLIMIT_MEMLOCK (9663676416) exceeded
> [  379.979310] ixgbevf 0000:03:10.2: enabling device (0000 -> 0002)
> [  379.979505] ixgbe 0000:03:00.0 enp3s0: VF Reset msg received from vf 1
> [  379.992624] ixgbevf 0000:03:10.2: 2e:7a:3e:95:5d:be
> [  379.992627] ixgbevf 0000:03:10.2: MAC: 1
> [  379.992629] ixgbevf 0000:03:10.2: Intel(R) 82599 Virtual Function
> [  379.993594] ixgbevf 0000:03:10.2 enp3s0v1: renamed from eth1
> [  380.043490] ixgbevf 0000:03:10.2: NIC Link is Up 1 Gbps
> [  380.045081] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0v1: link becomes ready
> 
> This does not seem to be device related, I initially tried with
> a VF of an Intel 82599 10 Gigabit NIC but also tried other
> physical PCI devices. I also initially tried increasing the ulimit
> but looking at the code it seems the limit is actually 9663676416 bytes
> so that should be plenty.
> 
> Simply rebooting into v5.8.1 (official Arch Linux Kernel but that's
> pretty much exactly Greg's stable series and I based my config on its config)
> fixes the issue and the same setup works perfectly.
> In most documentation people only use Intel boxes for pass-through
> so I should mention that this is a AMD Ryzen 9 3900X
> with Starship/Matisse IOMMU and my Kernel command line contains
> "amd_iommu=on iommu=pt".
> Does any of this ring a bell for you or do we definitely need
> a full bisect or any other information?

Hi Niklas,

It does not sound familiar to me and I haven't encountered it in my
testing of rc1, though I haven't tested on AMD specifically.  There was
nothing from my pull request that should have affected page pinning.
Please let us know your results.  Thanks,

Alex