On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote: > Add supports for datagram type for virtio-vsock. Datagram > sockets are connectionless and unreliable. To avoid contention > with stream and other sockets, add two more virtqueues and > a new feature bit to identify if those two new queues exist or not. > > Also add descriptions for resource management of datagram, which > does not use the existing credit update mechanism associated with > stream sockets. > > Signed-off-by: Jiang Wang <jiang.wang@xxxxxxxxxxxxx> > --- > V2 addressed the comments for the previous version. > > virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 52 insertions(+), 10 deletions(-) > > diff --git a/virtio-vsock.tex b/virtio-vsock.tex > index da7e641..62c12e0 100644 > --- a/virtio-vsock.tex > +++ b/virtio-vsock.tex > @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues} > \begin{description} > \item[0] rx > \item[1] tx > +\item[2] datagram rx > +\item[3] datagram tx > +\item[4] event > +\end{description} > +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it > +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets. > + > +\begin{description} > +\item[0] rx > +\item[1] tx > \item[2] event > \end{description} > I suggest renaming "rx" and "tx" to "stream rx" and "stream tx" virtqueues and also adding this: When behavior differs between stream and datagram rx/tx virtqueues their full names are used. Common behavior is simply described in terms of rx/tx virtqueues and applies to both stream and datagram virtqueues. This way you won't need to duplicate portions of the spec that deal with populating the virtqueues, for example. It's also clearer to use a full name for stream rx/tx virtqueues instead of calling them rx/tx virtqueues now that we have datagram rx/tx virtqueues. > + > \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits} > > -There are currently no feature bits defined for this device. > +\begin{description} > +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type. > +\end{description} > > \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout} > > @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op > > \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control} > > +Flow control applies to stream sockets; datagram sockets do not have > +flow control. > + > The tx virtqueue carries packets initiated by applications and replies to > received packets. The rx virtqueue carries packets initiated by the device and > replies to previously transmitted packets. > @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera > consists of a (cid, port number) tuple. The header fields used for this are > \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}. > > -Currently only stream sockets are supported. \field{type} is 1 for stream > -socket types. > +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream > +socket types. \field{type} is 3 for dgram socket types. > > Stream sockets provide in-order, guaranteed, connection-oriented delivery > without message boundaries. > > +Datagram sockets provide connectionless unreliable messages of > +a fixed maximum length. Plus unordered (?) and with message boundaries. In other words: Datagram sockets provide unordered, unreliable, connectionless message with message boundaries and a fixed maximum length. I didn't think of the fixed maximum length aspect before. I guess the intention is that the rx buffer size is the message size limit? That's different from UDP messages, which can be fragmented into multiple IP packets and can be larger than 64KiB: https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure Is it possible to support large datagram messages in vsock? I'm a little concerned that applications that run successfully over UDP will not be portable if vsock has this limitation because it would impose extra message boundaries that the application protocol might not tolerate. > + > \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management} > \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of > stream sockets. The guest and the device publish how much buffer space is > @@ -162,7 +181,7 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / > u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt); > \end{lstlisting} > > -If there is insufficient buffer space, the sender waits until virtqueue buffers > +For stream sockets, if there is insufficient buffer space, the sender waits until virtqueue buffers > are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending > the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is > available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet. > @@ -170,16 +189,28 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / > previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows > communicating updates any time a change in buffer space occurs. > > +Unlike stream sockets, dgram sockets do not use VIRTIO_VSOCK_OP_CREDIT_UPDATE or > +VIRTIO_VSOCK_OP_CREDIT_REQUEST packets. The dgram buffer management > +is split to two parts: tx side and rx side. For the tx side, there is > +additional buffer space for each socket. Plus: ... according to the the driver and device's available memory resources. The amount of tx buffer space is an implementation detail of both the device and the driver. It is not visible to the other side and may be controlled by the application or administrative resource limits. What I'm trying to describe here is that the additional tx buffer space isn't part of the device interface. > +The dgram sender sends packets when the virtqueue or the additional buffer is not full. > +When both are full, the sender > +MUST return an appropriate error to the upper layer application. MUST, SHOULD, etc clauses need to go into the devicenormative/drivernormative sections. They cannot be in regular text. > +For the rx side, dgram also uses the \field{buf_alloc}. If it is full, the packet > +is dropped by the receiver. UDP is connectionless so any number of other sources can send messages to the same destination, causing buf_alloc's value to be unpredictable. Can you explain how buf_alloc works with datagram sockets in more detail? > \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management} > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has > -sufficient free buffer space for the payload. > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver. > > All packets associated with a stream flow MUST contain valid information in > \field{buf_alloc} and \field{fwd_cnt} fields. > > \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management} > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has > -sufficient free buffer space for the payload. > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver. > > All packets associated with a stream flow MUST contain valid information in > \field{buf_alloc} and \field{fwd_cnt} fields. > @@ -203,14 +234,14 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De > The \field{guest_cid} configuration field MUST be used as the source CID when > sending outgoing packets. > > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an > unknown \field{type} value. What about datagram sockets? Please state what must happen and why. > > \devicenormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit} > > The \field{guest_cid} configuration field MUST NOT contain a reserved CID as listed in \ref{sec:Device Types / Socket Device / Device configuration layout}. > > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an > unknown \field{type} value. > > \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets} > @@ -240,6 +271,17 @@ \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device O > destination) address tuple for a new connection while the other peer is still > processing the old connection. > > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets} s/Stream Sockets/Datagram Sockets/ > + > +Datagram (dgram) sockets are connectionless and unreliable. The sender just sends > +a message to the peer and hope it will be delivered. A VIRTIO_VSOCK_OP_RST reply is sent if s/hope/hopes/ > +a receiving socket does not exist on the destination. > +If the transmission or receiving buffers are full, the packets > +are dropped. If the transmission buffer is full, an appropriate error message > +is returned to the caller. It's unclear whether the caller is the driver/device or something else. I think you're referring to the application interace, which is outside the scope of the VIRTIO spec. I suggest dropping the last sentence.
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization