Hi, I've got a weird hang here when unbinding a device from vfio-pci. (confirmed still hanging on 4.4.1, but originally stuck with at least redhat's 3.10.0-327.4.4 which could contain anything so if you'd like me to test a specific older version for regression please just give me a tag) Here's a reproducer for my hardware. I'm binding a mlx4 IB card (vendor:device id are 15b3:1003, pci address is 0000:90:00.0) to a VM, starting with vhost net on a freshly created tuntap: ------->8------ modprove vhost_net modprobe vfio-pci # register the model, unbind the card from current driver, bind to # vfio-pci and give device to qemu echo "15b3 1003" > /sys/bus/pci/drivers/vfio-pci/new_id echo 0000:90:00.0 > /sys/bus/pci/devices/0000:90:00.0/driver/unbind echo 0000:90:00.0 > /sys/bus/pci/drivers/vfio-pci/bind chown qemu: /dev/vfio/32 # create tuntap, bring it up and open vhost-net fd ip tuntap add dev tap-hang-0 mode tap user qemu ip link set tap-hang-0 mtu 9000 up exec 10<>/dev/vhost-net # run qemu. This inits the first devices and exits because of # nonexistant device, so no guest OS is ever involved su qemu -s/bin/sh -c '/usr/libexec/qemu-kvm --enable-kvm \ -m 16G -smp 24 -device vfio-pci,id=ib0,host=90:00.0 \ -netdev type=tap,id=guest0,ifname=tap-hang-0,script=no,downscript=no,vhost=on,vhostfd=10 \ -device virtio-net-pci,netdev=guest0,mac=52:54:00:ff:17:12 -device nonexistant' echo 0000:90:00.0 > /sys/bus/pci/drivers/vfio-pci/unbind ------->8------ The last command here hangs, I get this in dmesg: [ 163.458817] vfio-pci 0000:90:00.0: Relaying device request to user (#0) [ 255.802784] vfio-pci 0000:90:00.0: Relaying device request to user (#10) [ 355.805452] vfio-pci 0000:90:00.0: Relaying device request to user (#20) [ 455.808150] vfio-pci 0000:90:00.0: Relaying device request to user (#30) [ 555.810916] vfio-pci 0000:90:00.0: Relaying device request to user (#40) [ 655.813904] vfio-pci 0000:90:00.0: Relaying device request to user (#50) [ 755.816818] vfio-pci 0000:90:00.0: Relaying device request to user (#60) On pressing ^C I get: [ 205.793450] vfio-pci 0000:90:00.0: Device is currently in use, task "bash" (9719) blocked until device is released Two extra observations: - this does not happen if I do not add vhost-net (so adding a network interface without vhost=on will not hang) - the exact same script running qemu as root will not hang either I ran qemu with strace to compare the output, and after uniformizing pointers I do not notice any real difference (no EACCES or similar error at least), so the difference is probably somewhere in kernel land. >From what I gather we're basically stuck in vfio_del_group_dev() because of a ref leak or something... Does that ring a bell for anyone? Is there anywhere I could tune verbosity to help debug this? Thanks, -- Dominique Martinet -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html