On 24/08/2017 23:23, Michael S. Tsirkin wrote:
On Thu, Aug 24, 2017 at 07:42:07PM +0200, Pierre Morel wrote:
On 24/08/2017 16:19, Michael S. Tsirkin wrote:
On Wed, Aug 23, 2017 at 06:33:02PM +0200, Pierre Morel wrote:
Reseting a device can sometime fail, even a virtual device.
If the device is not reseted after a while the driver should
abandon the retries.
This is the change proposed for the modern virtio_pci.
More generally, when this happens,the virtio driver can set the
VIRTIO_CONFIG_S_FAILED status flag to advertise the caller.
The virtio core can test if the reset was succesful by testing
this flag after a reset.
This behavior is backward compatible with existing drivers.
This behavior seems to me compatible with Virtio-1.0 specifications,
Chapters 2.1 Device Status Field.
There I definitively need your opinion: Is it right?
This patch also lead to another question:
do we care if a device provided by the hypervisor is buggy?
Signed-off-by: Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx>
So I think this is not the best place to start to add error recovery.
I agree, there can not be any error recovery there.
If reset does not work we can let fall the device until next reset of the
hypervisor.
On probe, yes. But failures are more likely to trigger at other times.
OK, what about:
- On probe if reset fail, the probe fail.
- On freeze and remove : we can not free resources which are common
with the device, at least the queues.
... we can only signal the error and give up with the device.
It should be much more common to have a situation where device gets
broken while it's being used. Spec has a NEEDS_RESET flag for this.
Yes the device side can set this flag, but it is another problem, it is
supposing that:
- the transport, device side, still works.
- it is able to detect that the device need a reset
- a reset is effective
Right. OTOH in this case there's more we can do.
Yes, I did not find a single test of this flag (NEEDS_RESET).
even QEMU set it quite often (though virtio_error())
The decision to reset the device must come from the driver.
The protocol to reset the device is device/driver specific... lotta work
Shouldn't it be separate from the "reset failed" problem?
Regards,
Pierre
I think we should start by coding up that support in all virtio drivers.
As a next step, we can add more code to detect unexpected behaviour by
the host and mark device as broken. Then we can do more things by
looking at the broken flag.
It seems difficult to me.
But may be I went too fast to the conclusion that there is nothing to do.
I still think about it.
Best regards
Pierre
---
drivers/virtio/virtio.c | 4 ++++
drivers/virtio/virtio_pci_modern.c | 11 ++++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 48230a5..6255dc4 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -324,6 +324,8 @@ int register_virtio_device(struct virtio_device *dev)
/* We always start by resetting the device, in case a previous
* driver messed it up. This also tests that code path a little. */
dev->config->reset(dev);
+ if (dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED)
+ return -EIO;
/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
@@ -373,6 +375,8 @@ int virtio_device_restore(struct virtio_device *dev)
/* We always start by resetting the device, in case a previous
* driver messed it up. */
dev->config->reset(dev);
+ if (dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED)
+ return -EIO;
/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 2555d80..bfc5fc1 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -270,6 +270,7 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
static void vp_reset(struct virtio_device *vdev)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+ int retry_count = 10;
/* 0 status means a reset. */
vp_iowrite8(0, &vp_dev->common->device_status);
/* After writing 0 to device_status, the driver MUST wait for a read of
@@ -277,8 +278,16 @@ static void vp_reset(struct virtio_device *vdev)
* This will flush out the status write, and flush in device writes,
* including MSI-X interrupts, if any.
*/
- while (vp_ioread8(&vp_dev->common->device_status))
+ while (vp_ioread8(&vp_dev->common->device_status) && retry_count--)
msleep(1);
+ /* If the read did not return 0 before the timeout consider that
+ * the device failed.
+ */
+ if (retry_count <= 0) {
+ virtio_add_status(vdev, VIRTIO_CONFIG_S_FAILED);
+ return;
+ }
+ virtio_add_status(vdev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
/* Flush pending VQ/configuration callbacks. */
vp_synchronize_vectors(vdev);
}
--
2.3.0
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization