> From: Parav Pandit <parav@xxxxxxxxxx> > Sent: Tuesday, March 5, 2024 12:06 PM > > When the PCI device is surprise removed, requests won't complete from the > device. These IOs are never completed and disk deletion hangs indefinitely. > > Fix it by aborting the IOs which the device will never complete when the VQ is > broken. > > With this fix now fio completes swiftly. > An alternative of IO timeout has been considered, however timeout can take > very long time. For unresponsive device, quickly completing the request with > error enables users and upper layer to react quickly. > > Verified with multiple device unplug cycles with pending IOs in virtio used ring > and some pending with device. > > Also verified without surprise removal. > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci > device") > Cc: stable@xxxxxxxxxxxxxxx > Reported-by: lirongqing@xxxxxxxxx > Closes: > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741 > @baidu.com/ > Co-developed-by: Chaitanya Kulkarni <kch@xxxxxxxxxx> > Signed-off-by: Chaitanya Kulkarni <kch@xxxxxxxxxx> > Signed-off-by: Parav Pandit <parav@xxxxxxxxxx> > --- Please ignore this patch, I am still debugging and verifying it. Incorrectly it got CCed to stable. I am sorry for the noise. > changelog: > v0->v1: > - addressed comments from Ming and Michael > - changed the flow to not depend on broken vq anymore to avoid > the race of missing the detection > - flow updated to quiesce request queue and device, followed by > syncing with any ongoing interrupt handler or callbacks, > finally finishing the requests which are not completed by device > with error. > --- > drivers/block/virtio_blk.c | 46 +++++++++++++++++++++++++++++++++++- > -- > 1 file changed, 43 insertions(+), 3 deletions(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index > 2bf14a0e2815..1956172b4b1a 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -1562,9 +1562,52 @@ static int virtblk_probe(struct virtio_device *vdev) > return err; > } > > +static bool virtblk_cancel_request(struct request *rq, void *data) { > + struct virtblk_req *vbr = blk_mq_rq_to_pdu(rq); > + > + vbr->in_hdr.status = VIRTIO_BLK_S_IOERR; > + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) > + blk_mq_complete_request(rq); > + > + return true; > +} > + > static void virtblk_remove(struct virtio_device *vdev) { > struct virtio_blk *vblk = vdev->priv; > + int i; > + > + /* Block upper layer to not get any new requests */ > + blk_mq_quiesce_queue(vblk->disk->queue); > + > + mutex_lock(&vblk->vdev_mutex); > + > + /* Stop all the virtqueues and configuration change notification and > also > + * synchronize with pending interrupt handlers. > + */ > + virtio_reset_device(vdev); > + > + mutex_unlock(&vblk->vdev_mutex); > + > + /* Syncronize with any callback handlers for request completion */ > + for (i = 0; i < vblk->num_vqs; i++) > + virtblk_done(vblk->vqs[i].vq); > + > + blk_sync_queue(vblk->disk->queue); > + > + /* At this point block layer and device/transport are quiet; > + * hence, safely complete all the pending requests with error. > + */ > + blk_mq_tagset_busy_iter(&vblk->tag_set, virtblk_cancel_request, > vblk); > + blk_mq_tagset_wait_completed_request(&vblk->tag_set); > + > + /* > + * Unblock any pending dispatch I/Os before we destroy device. From > + * del_gendisk() -> __blk_mark_disk_dead(disk) will set GD_DEAD flag, > + * that will make sure any new I/O from bio_queue_enter() to fail. > + */ > + blk_mq_unquiesce_queue(vblk->disk->queue); > > /* Make sure no work handler is accessing the device. */ > flush_work(&vblk->config_work); > @@ -1574,9 +1617,6 @@ static void virtblk_remove(struct virtio_device > *vdev) > > mutex_lock(&vblk->vdev_mutex); > > - /* Stop all the virtqueues. */ > - virtio_reset_device(vdev); > - > /* Virtqueues are stopped, nothing can use vblk->vdev anymore. */ > vblk->vdev = NULL; > > -- > 2.34.1