O Wed, Oct 26, 2022 at 02:11:21AM -0400, Michael S. Tsirkin wrote: > virtio uses the same driver for VFs and PFs. Accordingly, > pci_device_is_present is used to detect device presence. This function > isn't currently working properly for VFs since it attempts reading > device and vendor ID. As VFs are present if and only if PF is present, > just return the value for that device. > > Reported-by: Wei Gong <gongwei833x@xxxxxxxxx> > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > --- > > Wei Gong, thanks for your testing of the RFC! > As I made a small change, would appreciate re-testing. > > Thanks! > > changes from RFC: > use pci_physfn() wrapper to make the code build without PCI_IOV > > > drivers/pci/pci.c | 5 +++++ > 1 file changed, 5 insertions(+) Tested-by: Wei Gong <gongwei833x@xxxxxxxxx> retest done and well. I would rephrase the bug. according to sriov's protocol specification vendor_id and device_id field in all VFs return FFFFh when read so when vf devs is in the pci_device_is_present,it will be misjudged as surprise removeal when io is issued on the vf, normally disable virtio_blk vf devs,at this time the disable opration will hang. and virtio blk dev io hang. task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002 Call Trace: <TASK> __schedule+0x2ee/0x900 schedule+0x4f/0xc0 blk_mq_freeze_queue_wait+0x69/0xa0 ? wait_woken+0x80/0x80 blk_mq_freeze_queue+0x1b/0x20 blk_cleanup_queue+0x3d/0xd0 virtblk_remove+0x3c/0xb0 [virtio_blk] virtio_dev_remove+0x4b/0x80 device_release_driver_internal+0x103/0x1d0 device_release_driver+0x12/0x20 bus_remove_device+0xe1/0x150 device_del+0x192/0x3f0 device_unregister+0x1b/0x60 unregister_virtio_device+0x18/0x30 virtio_pci_remove+0x41/0x80 pci_device_remove+0x3e/0xb0 device_release_driver_internal+0x103/0x1d0 device_release_driver+0x12/0x20 pci_stop_bus_device+0x79/0xa0 pci_stop_and_remove_bus_device+0x13/0x20 pci_iov_remove_virtfn+0xc5/0x130 ? pci_get_device+0x4a/0x60 sriov_disable+0x33/0xf0 pci_disable_sriov+0x26/0x30 virtio_pci_sriov_configure+0x6f/0xa0 sriov_numvfs_store+0x104/0x140 dev_attr_store+0x17/0x30 sysfs_kf_write+0x3e/0x50 kernfs_fop_write_iter+0x138/0x1c0 new_sync_write+0x117/0x1b0 vfs_write+0x185/0x250 ksys_write+0x67/0xe0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x61/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f21bd1f3ba4 RSP: 002b:00007ffd34a24188 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f21bd1f3ba4 RDX: 0000000000000002 RSI: 0000560305040800 RDI: 0000000000000001 RBP: 0000560305040800 R08: 000056030503fd50 R09: 0000000000000073 R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000002 R13: 00007f21bd2de760 R14: 00007f21bd2da5e0 R15: 00007f21bd2d99e0 when virtio_blk is performing io, as long as there two stages of: 1. dispatch io. queue_usage_counter++; 2. io is completed after receiving the interrupt. queue_usage_counter--; disable virtio_blk vfs: if(!pci_device_is_present(pci_dev)) virtio_break_device(&vp_dev->vdev); virtqueue for vf devs will be marked broken. the interrupt notification io is end. Since it is judged that the virtqueue has been marked as broken, the completed io will not be performed. So queue_usage_counter will not be cleared. when the disk is removed at the same time, the queue will be frozen, and you must wait for the queue_usage_counter to be cleared. Therefore, it leads to the removeal of hang. Thanks, Wei Gong