On Mon, Jan 22, 2024 at 11:43 PM Stefan Hajnoczi <stefanha@xxxxxxxxxx> wrote: > > On Mon, Jan 22, 2024 at 07:07:22PM +0800, Yi Sun wrote: > > Ensure no remaining requests in virtqueues before resetting vdev and > > deleting virtqueues. Otherwise these requests will never be completed. > > It may cause the system to become unresponsive. So it is better to place > > blk_mq_quiesce_queue() in front of virtio_reset_device(). > > QEMU's virtio-blk device implementation completes all requests during > device reset. Most device implementations have to do the same to avoid > leaving dangling requests in flight across device reset. > > Which device implementation are you using and why is it safe for the > device is simply drop requests across device reset? > > Stefan Virtio-blk device implementation completes all requests during device reset, but this can only ensure that the device has finished using virtqueue. We should also consider the driver's use of virtqueue. I caught such an example. Before del_vqs, the request had been processed by the device, but it had not been processed by the driver. Although I am using kernel5.4, I think this problem may also occur with the latest version of kernel. The debug code I added is as follows: virtblk_freeze() { vdev reset(); quiesce queue(); if (virtqueue->num_free != 1024) //1024 is the init value. BUG_ON(1); vdev del_vqs(); } BUG_ON triggered the dump, the analysis is as follows: There is one request left in request_queue. crash_arm64> struct request ffffff81f0560000 | grep -e state -e __data_len __data_len = 20480, state = MQ_RQ_IN_FLIGHT, crash_arm64> vring_virtqueue.packed,last_used_idx,broken,vq 0xffffff8086f92900 | grep -e num -e used_wrap_counter -e last_used_idx -e broken -e num_free -e desc_state -e "desc =" num = 1024, desc = 0xffffff8085ff8000, used_wrap_counter = false, desc_state = 0xffffff8085610000, last_used_idx = 487, broken = false, num_free = 1017, Find desc based on last_used_idx. Through flags, we can know that the request has been processed by the device, but it is still in flight state because it has not had time to run virtblk_done(). crash_arm> vring_packed_desc ffffff8085ff9e70 struct vring_packed_desc { addr = 10474619192, len = 20481, id = 667, flags = 2 } I'm using a closed source virtual machine, so I can't see the source code for it, but I'm guessing it's similar to qemu. After the device completes the request, we must also ensure that the driver can complete the request in virtblk_done().