Re: Virtio-net - add timeouts to control commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/24/22 11:06, Jason Wang wrote:
On Wed, Aug 24, 2022 at 3:52 PM Alvaro Karsz <alvaro.karsz@xxxxxxxxxxxxx> wrote:

I think that we should add a timeout to the control virtqueue commands.
If the hypervisor crashes while handling a control command, the guest
will spin forever.
This may not be necessary for a virtual environment, when both the
hypervisor and the guest OS run in the same bare metal, but this
is needed for a physical network device compatible with VirtIO.

(In these cases, the network device acts as the hypervisor, and the
server acts as
the guest OS).

The network device may fail to answer a control command, or may crash, leading
to a stall in the server.

My idea is to add a big enough timeout, to allow the slow devices to
complete the command.

I wrote a simple patch that returns false from virtnet_send_command in
case of timeouts.

The timeout approach introduces some serious problems in cases when
the network device does answer the control command, but after the
timeout.

* The device will think that the command succeeded, while the server won't.
    This may be serious with the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command.
    The server may receive packets in an unexpected queue.

* virtqueue_get_buf will return the previous response for the next
control command.

Addressing this case by adding a timeout  to the spec won't be easy,
since the network device and the server have different clocks, and the
server won't know when exactly the network device noticed the kick.

So maybe we should call virtnet_remove if we reach a timeout.

Or reset but can we simply use interrupt instead of the busy waiting?


There are two possible ways of handling this:
a) let the device do the timeout: pass in a timeout value with the command, and allow the device to return an ETIMEDOUT error when the timeout expires. Then it's up to the device to do the necessary timeout handling; the server won't be involved at all (except for handling an ETIMEDOUT error) b) implement an 'abort' command. With that the server controls the timeout, and is allowed to send an 'abort' command when the timeout expires. That requires the device to be able to abort commands (which not all devices are able to), but avoids having to implement a timeout handling in the device.

We can actually specify both methods, and have configuration bits indicating which method is supported by the device.

I am very much in favour of having timeouts for virtio commands; we've had several massive customer escalations which could have been solved if we were able to set the command timeout in the VM. As this was for virtio-scsi/virtio-block I would advocate to have a generic virtio command timeout, not a protocol-specific one.

Cheers,

Hannes
--
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@xxxxxxx			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux