[PATCH RFC 0/7] virtio: avoid various hang situations during hot-unplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

this patch-set tries to solve various hang situations when virtio devices
(network or block) are hot-unplugged from a KVM guest.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

The guest is notified about the hard removal with a CRW machine check.
As per architecture, the host must repond to any I/O operation for the
removed device with an error condition as if the device had never been
there.

During machine check handling in the guest, virtio exit functions try to
perform cleanup operations by triggering final I/O, including appropriate
host kicks. These operations fail, or do not complete, and lead to several
kinds of hang situations. In particular, virtio-ccw guest->host notification
on an unplugged device will receive an error; this is, however, not reflected
back to the affected virtqueues.

Here are the details for some of the errors.

In the network case a hang (loop) occurs when a machine check is handled
on System z due to a vlan device removal. A loop spinning for a response
for final IO in virtnet_send_command() will never complete successfully
because of a previous unsuccessfull host kick operation (virtqueue_kick()).

The below patches [1,2] flag the virtqueue as 'broken' when a host kick failure
is detected. Patch [3] exploits this error info to avoid an endless invocation
of cpu_relax() when waiting for the command to complete.

Hang situations also occur when a block device is hot-unplugged.

Several different errors occur when a block device with mounted file-system(s)
is hot-unplugged. Asynchronous writeback functions, as well as page cache read
or write operations end up in never ending wait situations. Hang situations
occur during device removal when virtblk_remove() invokes del_gendisk() to
synch dirty inode pages (invalidate_partition()).

The below patches [4,5,6,7] also exploit a 'broken' virtqueue in order to
trigger IO errors as well as to prevent final hanging IO operations.


Heinz Graalfs (7):
  virtio_ring: add new functions virtqueue{_set_broken()/_is_broken()}
  s390/virtio_ccw: set virtqueue as broken if host notify failed
  virtio_net: avoid cpu_relax() call loop in case virtqueue is broken
  virtio_blk: use dummy virtqueue_notify() to detect host kick error
  virtio_blk: do not free device id if virtqueue is broken
  virtio_blk: set request queue as dying in case virtqueue is broken
  virtio_blk: trigger IO errors in case virtqueue is broken

 drivers/block/virtio_blk.c    | 41 ++++++++++++++++++++++++++++++++++++-----
 drivers/net/virtio_net.c      |  4 +++-
 drivers/s390/kvm/virtio_ccw.c |  2 ++
 drivers/virtio/virtio_ring.c  | 16 ++++++++++++++++
 include/linux/virtio.h        |  4 ++++
 5 files changed, 61 insertions(+), 6 deletions(-)

-- 
1.8.3.1

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux