Hey, I'm out of my QOM depth, so I'll just beg for help in advance. I noticed in testing vfio-pci hotunplug that the host seems to be trying to reclaim the device before QEMU is actually done with it, there's a very short race where libvirt has seen the DEVICE_DELETED event and tries to unbind the physical device from vfio-pci, the use count is clearly non-zero because the host driver tries to send a device request, but that event channel has already been torn down. Nearly immediately after, QEMU finally releases the device, but we can't do a proper reset due to some issues with device references in the kernel. When I run gdb on QEMU with breakpoints at qapi_event_send_device_deleted() and vfio_instance_finalize(), the QAPI even happens first. Clearly this is horribly wrong, right? I can't unmap my references to the vfio device file until my instance_finalize is called, so I'm always going to have that open when libvirt takes the DEVICE_DELETED event as a cue to return the device to host drivers. The call chains look like this: #0 qapi_event_send_device_deleted (has_device=true, device=0x7f5ca3e36fb0 "hostdev0", path=0x7f5c89e84fe0 "/machine/peripheral/hostdev0", errp=0x7f5ca241f9e8 <error_abort>) at qapi-event.c:412 #1 0x00007f5ca1701608 in device_unparent (obj=0x7f5ca43ffc00) at hw/core/qdev.c:1115 #2 0x00007f5ca18b7891 in object_finalize_child_property (obj=0x7f5ca380f500, name=0x7f5ca3f21da0 "hostdev0", opaque=0x7f5ca43ffc00) at qom/object.c:1362 #3 0x00007f5ca18b56b2 in object_property_del_child (obj=0x7f5ca380f500, child=0x7f5ca43ffc00, errp=0x0) at qom/object.c:422 #4 0x00007f5ca18b5790 in object_unparent (obj=0x7f5ca43ffc00) at qom/object.c:441 #5 0x00007f5ca16c1f31 in acpi_pcihp_eject_slot (s=0x7f5ca4c41268, bsel=0, slots=4) at hw/acpi/pcihp.c:139 #0 vfio_instance_finalize (obj=0x7f5ca43ffc00) at /net/gimli/home/alwillia/Work/qemu.git/hw/vfio/pci.c:2731 #1 0x00007f5ca18b57c0 in object_deinit (obj=0x7f5ca43ffc00, type=0x7f5ca376f490) at qom/object.c:448 #2 0x00007f5ca18b5831 in object_finalize (data=0x7f5ca43ffc00) at qom/object.c:462 #3 0x00007f5ca18b6782 in object_unref (obj=0x7f5ca43ffc00) at qom/object.c:896 #4 0x00007f5ca1550cc0 in memory_region_unref (mr=0x7f5ca43fff00) at /net/gimli/home/alwillia/Work/qemu.git/memory.c:1476 #5 0x00007f5ca1553886 in do_address_space_destroy (as=0x7f5ca43ffe10) at /net/gimli/home/alwillia/Work/qemu.git/memory.c:2272 It appears that DEVICE_DELETED only means the VM is done with the device but libvirt is interpreting it as QEMU is done with the device. Which is correct? Do we need a new event or do we need to fix the ordering of this event? An ordering fix would be more compatible with existing libvirt. Thanks, Alex -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list