On 21/11/13 07:47, Michael S. Tsirkin wrote:
On Wed, Nov 20, 2013 at 04:22:00PM +0100, Heinz Graalfs wrote:
Hi,
when an active virtio block device is hot-unplugged from a KVM guest, running
affected guest user applications are not aware of any errors that occur due
to the lost device. This patch-set adds code to avoid further request queueing
when a lost block device is detected, resulting in appropriate error info.
On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.
When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.
It's just the block device drivers that care to provide a notify
callback.
Here are some more error details:
For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.
In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and adding a corresponding new virtio_driver notify() handler to
'inform' the block layer, solve this task.
Patch 1 adds an optional notify() callback to virtio_driver.
Patch 2 adds a new notify() callback for the virtio_blk driver. When called
for a lost device settings are made to prevent future request queueing.
Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass
on the lost device info to virtio's backend driver virtio_blk.
Question: I guess remove callback is invoked eventually?
Could you please clarify why isn't this sufficient?
yes, the remove callback is invoked lateron, and it could be done there.
However, it should be done conditionally, and prior to invoking
del_gendisk() (which triggers final I/O). We would still have the need
for such notification information. The remove callback is also invoked
when a device is set offline, and in that case we don't want a queue to
reject further requests. The way it is done right doesn't affect the
remove callback. The weird situation, however, is solved by the new
notify callback.
Doing it in blk_cleanup_queue() (also triggered from virtblk_remove())
is too late for this scenario of a lost device. One wouldn't see any
errors, but experience a 'hang' due to inclomplete I/O. The invocation
of virtblk_request() indirectly caused by del_gendisk() would accept
requests, the subsequent host notification, however, would fail. (This
is probably another 'window' that should be closed.)
Heinz Graalfs (3):
virtio: add notify() callback to virtio_driver
virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification
drivers/block/virtio_blk.c | 14 ++++++++++++++
drivers/s390/kvm/virtio_ccw.c | 14 ++++++++++++--
drivers/virtio/virtio.c | 8 ++++++++
include/linux/virtio.h | 10 ++++++++++
4 files changed, 44 insertions(+), 2 deletions(-)
--
1.8.3.1
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization