> From: Li,Rongqing <lirongqing@xxxxxxxxx> > Sent: Thursday, January 11, 2024 5:28 PM > > > > -----Original Message----- > > From: Parav Pandit <parav@xxxxxxxxxx> > > Sent: Thursday, January 11, 2024 7:55 PM > > To: Li,Rongqing <lirongqing@xxxxxxxxx>; mst@xxxxxxxxxx; > > jasowang@xxxxxxxxxx; xuanzhuo@xxxxxxxxxxxxxxxxx; > > virtualization@xxxxxxxxxxxxxxx > > Cc: songyang23 <songyang23@xxxxxxxxx>; liubokai > <liubokai@xxxxxxxxx>; > > Song,Zhan <songzhan@xxxxxxxxx> > > Subject: RE: [RFC] Revert "virtio_pci: Support surprise removal of > > virtio pci > > > > > > > From: Li,Rongqing <lirongqing@xxxxxxxxx> > > > Sent: Thursday, January 11, 2024 5:18 PM > > > > > > > > > > -----Original Message----- > > > > From: Parav Pandit <parav@xxxxxxxxxx> > > > > Sent: Thursday, January 11, 2024 6:18 PM > > > > To: Li,Rongqing <lirongqing@xxxxxxxxx>; mst@xxxxxxxxxx; > > > > jasowang@xxxxxxxxxx; xuanzhuo@xxxxxxxxxxxxxxxxx; > > > > virtualization@xxxxxxxxxxxxxxx > > > > Cc: songyang23 <songyang23@xxxxxxxxx>; liubokai > > > <liubokai@xxxxxxxxx>; > > > > Song,Zhan <songzhan@xxxxxxxxx> > > > > Subject: RE: [RFC] Revert "virtio_pci: Support surprise removal of > > > > virtio pci > > > > > > > > > > > > > From: Li,Rongqing <lirongqing@xxxxxxxxx> > > > > > Sent: Thursday, January 11, 2024 2:45 PM > > > > > > > > > > Revert "virtio_pci: Support surprise removal of virtio pci device" > > > > > > > > > > This reverts commit 43bb40c5b92659966bdf4bfe584fde0a3575a049. > > > > > > > > > > Marking the device as broken will cause the uncompleted IO > > > > > request on virtio-blk since virtblk_done will not continue when > > > > > it find the broken virtqueu at last it will cause the failure of > > > > > removal of virtio-blk device because of uncompleted IO request > > > > > > > > > > The correct fix for the issue that commit 43bb40c5b9 ("virtio_pci: > > > > > Support surprise removal of virtio pci device") tried to fix is > > > > > that virtio backend always complete the IO request as soon as > > > > > possible > > > > > > > > > This can never be guaranteed by a device and by pci spec given it > > > > is surprise removal by definition. > > > > > > > > For Linux virtio blk is not the only block device which has surprised > removal. > > > > Nvme blk driver has supported surprise removal for several years now. > > > > I am sure nbd and others would have too given its network. > > > > > > > > I am not familiar with the blk driver stack. > > > > So please explore with blk community of how to complete an > > > > aborted/never completed io which never completed by the device. > > > > Since the blk driver knows exactly that the device is removed, it > > > > can 100% complete the io to upper layer with the error in deterministic > way. > > > > > > > > > > > > > Without the commit 43bb40c5b92("virtio_pci: Support surprise removal > > > of virtio pci device"), My virtio-blk device can be removed > > > successfully, with this commit, and if we use fio to send io > > > requests the virtio-blk, The removal always failed. > > > > > How can it complete when the device is not going complete the io ever? > > > > Now this issue is that the virtio backend (device) has completed the io request, > but virtio-blk Can not get the completed io request, since the virtqueue is > marked as broken. Your test was just a timing luck when the io was completed, it is not always the case. Hence, vblk driver should flush all the pending requests with error to upper layers for which the ios have not yet processed after it was broken because when surprise removal request was received from that point onwards there is no communication with device. > > -Li > > > > > > > Once virt queue is marked as broken, io request can not be finished > > > since lots of virtio functions will check If the vq is broken, if > > > broken, exit directly, like virtqueue_get_buf_ctx_packed > > > > > > static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, > > > unsigned int *len, > > > void **ctx) > > > { > > > struct vring_virtqueue *vq = to_vvq(_vq); > > > u16 last_used, id, last_used_idx; > > > bool used_wrap_counter; > > > void *ret; > > > > > > START_USE(vq); > > > > > > if (unlikely(vq->broken)) { > > > END_USE(vq); > > > return NULL; > > > } > > > > > > Nvme is different from virtio, it can ensure that all requests are > > > flushed to completion, like below patch: > > > > > Who is "it" in your above comment? Is it Driver or device? > > > > In below commit who (driver/device) flushes the completions? > > If it is device, then it is not surprise removal, right? > > > > > commit 1d39e6928cbd0eb737c51545210b5186d5551ba1 > > > Author: Keith Busch <keith.busch@xxxxxxxxx> > > > Date: Wed Jun 6 08:13:08 2018 -0600 > > > > > > nvme-pci: unquiesce dead controller queues > > > > > > This patch ensures the nvme namsepace request queues are not > > quiesced > > > on a surprise removal. It's possible the queues were previously killed > > > in a failed reset, so the queues need to be unquiesced to ensure all > > > requests are flushed to completion. > > > > > > Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> > > > Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx> > > > Reviewed-by: Johannes Thumshirn <jthumshirn@xxxxxxx> > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > > > > > > > > -Li