On 8/24/22 11:42, Alvaro Karsz wrote:
Hi Hannes,
a) let the device do the timeout: pass in a timeout value with the
command, and allow the device to return an ETIMEDOUT error when the
timeout expires. Then it's up to the device to do the necessary timeout
handling; the server won't be involved at all (except for handling an
ETIMEDOUT error)
This won't work if the device crashes.
b) implement an 'abort' command. With that the server controls the
timeout, and is allowed to send an 'abort' command when the timeout
expires. That requires the device to be able to abort commands (which
not all devices are able to), but avoids having to implement a timeout
handling in the device.
I actually thought about this idea.
This may work, but you'll still have a few moments when the server
assumes that the command failed, and the network device assumes that
it succeeded.
So the server may still receive packets in an unexpected queue.
No. The server may only assume that the command failed until it gets the
response for the abort command.
Before that the state of the command is undefined, and the server may
not assume anything here.
And then we get into the fun topic of timing out aborts, which really
can only be resolved if you have a fool-proof way of resetting the queue
itself. But I guess virtio can do that (right?).
I am very much in favour of having timeouts for virtio commands; we've
had several massive customer escalations which could have been solved if
we were able to set the command timeout in the VM.
As this was for virtio-scsi/virtio-block I would advocate to have a
generic virtio command timeout, not a protocol-specific one.
This may be difficult to implement.
Especially when multiple commands may be queued at the same time, and
the device can handle the commands in any order.
We'll need to add identifiers for every command.
Why, but of course. You cannot assume in-order delivery of the
completions; in fact, that's the whole _point_ of having a queue-based
I/O command delivery method.
I'm actually referring here to the Linux kernel implementation of
virtnet control commands, in which the server spins for a response.
Sheesh. Spinning for a response is never a good idea, as this means
you'll end up with a non-interruptible command in the guest (essentially
an ioctl into the hypervisor).
And really, identifying the command isn't hard. Each command already has
an identifier (namely the virtio ring index), so if in doubt you can
always use that.
To be foolproof, though, you might want to add a 'real' identifier (like
a 32 or 64 bit command tag), which would even allow you to catch
uninitialized / completed commands. Tends to be quite important when
implementing an 'abort' command, as the command referred to by the
'abort' command might have been completed by the time the hypervisor
processes the abort command.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization