vfio hang when unbinding after using qemu as user + vhost_net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've got a weird hang here when unbinding a device from vfio-pci.
(confirmed still hanging on 4.4.1, but originally stuck with at least
redhat's 3.10.0-327.4.4 which could contain anything so if you'd like me
to test a specific older version for regression please just give me a
tag)


Here's a reproducer for my hardware. I'm binding a mlx4 IB card
(vendor:device id are 15b3:1003, pci address is 0000:90:00.0) to a VM,
starting with vhost net on a freshly created tuntap:

------->8------
modprove vhost_net
modprobe vfio-pci

# register the model, unbind the card from current driver, bind to
# vfio-pci and give device to qemu
echo "15b3 1003" > /sys/bus/pci/drivers/vfio-pci/new_id
echo 0000:90:00.0 > /sys/bus/pci/devices/0000:90:00.0/driver/unbind
echo 0000:90:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
chown qemu: /dev/vfio/32

# create tuntap, bring it up and open vhost-net fd
ip tuntap add dev tap-hang-0 mode tap user qemu
ip link set tap-hang-0 mtu 9000 up
exec 10<>/dev/vhost-net

# run qemu. This inits the first devices and exits because of
# nonexistant device, so no guest OS is ever involved
su qemu -s/bin/sh -c '/usr/libexec/qemu-kvm --enable-kvm \
  -m 16G -smp 24 -device vfio-pci,id=ib0,host=90:00.0   \
  -netdev type=tap,id=guest0,ifname=tap-hang-0,script=no,downscript=no,vhost=on,vhostfd=10 \
  -device virtio-net-pci,netdev=guest0,mac=52:54:00:ff:17:12 -device nonexistant'

echo 0000:90:00.0 > /sys/bus/pci/drivers/vfio-pci/unbind
------->8------

The last command here hangs, I get this in dmesg:
[  163.458817] vfio-pci 0000:90:00.0: Relaying device request to user (#0)
[  255.802784] vfio-pci 0000:90:00.0: Relaying device request to user (#10)
[  355.805452] vfio-pci 0000:90:00.0: Relaying device request to user (#20)
[  455.808150] vfio-pci 0000:90:00.0: Relaying device request to user (#30)
[  555.810916] vfio-pci 0000:90:00.0: Relaying device request to user (#40)
[  655.813904] vfio-pci 0000:90:00.0: Relaying device request to user (#50)
[  755.816818] vfio-pci 0000:90:00.0: Relaying device request to user (#60)

On pressing ^C I get:
[  205.793450] vfio-pci 0000:90:00.0: Device is currently in use, task "bash" (9719) blocked until device is released


Two extra observations:
 - this does not happen if I do not add vhost-net (so adding a network
interface without vhost=on will not hang)
 - the exact same script running qemu as root will not hang either

I ran qemu with strace to compare the output, and after uniformizing
pointers I do not notice any real difference (no EACCES or similar error
at least), so the difference is probably somewhere in kernel land.


>From what I gather we're basically stuck in vfio_del_group_dev()
because of a ref leak or something...

Does that ring a bell for anyone?
Is there anywhere I could tune verbosity to help debug this?


Thanks,
-- 
Dominique Martinet
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux