Re: [External] Re: [RFC v4] virtio-vsock: add description for datagram type

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 8, 2021 at 6:46 AM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
>
> On Fri, May 28, 2021 at 04:01:18AM +0000, Jiang Wang wrote:
> >From: "jiang.wang" <jiang.wang@xxxxxxxxxxxxx>
> >
> >Add supports for datagram type for virtio-vsock. Datagram
> >sockets are connectionless and unreliable. To avoid contention
> >with stream and other sockets, add two more virtqueues and
> >a new feature bit to identify if those two new queues exist or not.
> >
> >Also add descriptions for resource management of datagram, which
> >does not use the existing credit update mechanism associated with
> >stream sockets.
> >
> >Signed-off-by: Jiang Wang <jiang.wang@xxxxxxxxxxxxx>
> >---
> >
> >V2: addressed the comments for the previous version.
> >V3: add description for the mergeable receive buffer.
> >V4: add a feature bit for stream and reserver a bit for seqpacket.
> >Fix mrg_rxbuf related sentences.
> >
> > virtio-vsock.tex | 155 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 142 insertions(+), 13 deletions(-)
> >
> >diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> >index da7e641..bacac3c 100644
> >--- a/virtio-vsock.tex
> >+++ b/virtio-vsock.tex
> >@@ -9,14 +9,41 @@ \subsection{Device ID}\label{sec:Device Types / Socket Device / Device ID}
> >
> > \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> > \begin{description}
> >-\item[0] rx
> >-\item[1] tx
> >+\item[0] stream rx
> >+\item[1] stream tx
> >+\item[2] datagram rx
> >+\item[3] datagram tx
> >+\item[4] event
>
> Is there a particular reason to always have the event queue as the last
> one?
>
> Maybe it's better to add the datagram queues at the bottom, so the first
> 3 queues are always the same.
>
I am not sure. I think Linux kernel should be fine with what you described.
But I am not sure about QEMU. From the code, I see virtqueue is allocated
as an array, like following,

+ #ifdef CONFIG_VHOST_VSOCK_DGRAM
+    struct vhost_virtqueue vhost_vqs[4];
+ #else
    struct vhost_virtqueue vhost_vqs[2];
+ #endi

so I assume the virtqueues for tx/rx should be
continuous? I can try to put the new queues at the end and see if it
works or not.

btw, my qemu change is here:
https://github.com/Jiang1155/qemu/commit/6307aa7a0c347905a31f3ca6577923e2f6dd9d84

> >+\end{description}
> >+The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> >+only uses 3 queues, as the following.
> >+
> >+\begin{description}
> >+\item[0] stream rx
> >+\item[1] stream tx
> > \item[2] event
> > \end{description}
> >
> >+When behavior differs between stream and datagram rx/tx virtqueues
> >+their full names are used. Common behavior is simply described in
> >+terms of rx/tx virtqueues and applies to both stream and datagram
> >+virtqueues.
> >+
> > \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> >
> >-There are currently no feature bits defined for this device.
> >+\begin{description}
> >+\item[VIRTIO_VSOCK_F_STREAM (0)] Device has support for stream socket type.
> >+\end{description}
> >+
> >+\begin{description}
> >+\item[VIRTIO_VSOCK_F_DGRAM (2)] Device has support for datagram socket
> >type.
> >+\end{description}
> >+
> >+\begin{description}
> >+\item[VIRTIO_VSOCK_F_MRG_RXBUF (3)] Driver can merge receive buffers.
> >+\end{description}
> >+
> >+If no feature bits are defined, then assume only VIRTIO_VSOCK_F_STREAM
> >is set.
>
> I'd say more like socket streams are supported, without reference to the
> feature bit, something like: "If no feature bits are defined, then
> assume device only supports stream socket type."
>
OK.

> >
> > \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> >
> >@@ -64,6 +91,8 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> >
> > Packets transmitted or received contain a header before the payload:
> >
> >+If feature VIRTIO_VSOCK_F_MRG_RXBUF is not negotiated, use the following header.
> >+
> > \begin{lstlisting}
> > struct virtio_vsock_hdr {
> >       le64 src_cid;
> >@@ -79,6 +108,15 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> > };
> > \end{lstlisting}
> >
> >+If feature VIRTIO_VSOCK_F_MRG_RXBUF is negotiated, use the following header.
> >+\begin{lstlisting}
> >+struct virtio_vsock_hdr_mrg_rxbuf {
> >+      struct virtio_vsock_hdr hdr;
> >+      le16 num_buffers;
> >+};
> >+\end{lstlisting}
> >+
> >+
> > The upper 32 bits of src_cid and dst_cid are reserved and zeroed.
> >
> > Most packets simply transfer data but control packets are also used for
> >@@ -107,6 +145,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> >
> > \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> >
> >+Flow control applies to stream sockets; datagram sockets do not have
> >+flow control.
> >+
> > The tx virtqueue carries packets initiated by applications and replies to
> > received packets.  The rx virtqueue carries packets initiated by the device and
> > replies to previously transmitted packets.
> >@@ -140,12 +181,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > consists of a (cid, port number) tuple. The header fields used for this are
> > \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> >
> >-Currently only stream sockets are supported. \field{type} is 1 for
> >stream
> >-socket types.
> >+Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> >+socket types. \field{type} is 3 for dgram socket types.
>
> When Arseny's change will merged, we can define and use
> VIRTIO_VSOCK_TYPE_DGRAM,.

Sure.
> >
> > Stream sockets provide in-order, guaranteed, connection-oriented
> > delivery
> > without message boundaries.
> >
> >+Datagram sockets provide unordered, unreliable, connectionless
> >messages
> >+with message boundaries and a maximum length.
> >+
> > \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> > \field{buf_alloc} and \field{fwd_cnt} are used for buffer space
> > management of
> > stream sockets. The guest and the device publish how much buffer space is
> >@@ -162,7 +206,7 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> > u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
> > \end{lstlisting}
> >
> >-If there is insufficient buffer space, the sender waits until virtqueue buffers
> >+For stream sockets, if there is insufficient buffer space, the sender waits until virtqueue buffers
> > are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending
> > the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is
> > available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet.
> >@@ -170,24 +214,55 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> > previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> > communicating updates any time a change in buffer space occurs.
> >
> >+Unlike stream sockets, dgram sockets do not use VIRTIO_VSOCK_OP_CREDIT_UPDATE or
> >+VIRTIO_VSOCK_OP_CREDIT_REQUEST packets. The dgram buffer management
> >+is split to two parts: tx side and rx side. For the tx side, if the
>
> Maybe better to use sender and receiver, since we use tx and rx to
> identify the queues.

OK.

> >+virtqueue is full, the packet will be dropped.
> >+For the rx side, dgram also uses the \field{buf_alloc}. If it is full, the packet
> >+is dropped by the receiver.
>
> This sentence is a bit unclear.
> `buf_alloc` for stream socket is used to inform the other peer about the
> receive buffer space. In this case we are using the local information,
> so there is no need to refer to `buf_alloc`. We can write something
> like: "The packet is dropped by the receiver if there is no space in the
> receive buffer".

OK.

> >+
> >+\drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Setting Up Receive Buffers}
> >+\begin{itemize}
> >+\item If VIRTIO_VSOCK_F_MRG_RXBUF is not negotiated, the driver SHOULD populate the receive queue(s)
> >+      with buffers of at least 1526 bytes for stream sockets and 4096
> >bytes for datagram sockets.
>
> Where does 1526 come from?

No specific reason. Any recommendations?

> We're adding a requirement for socket streams that wasn't there until
> now.

This is only when mergeable rxbuf bit is used. I think before this, the stream
rx buf should be at least bigger than the pkt header. We just did not put that
into the spec.

> >+\item If VIRTIO_VSOCK_F_MRG_RXBUF is negotiated, each buffer MUST be at
> >+least the size of the struct virtio_vsock_hdr_mgr_rxbuf.
> >+\end{itemize}
> >+
> >+\begin{note}
> >+Obviously each buffer can be split across multiple descriptor elements.
> >+\end{note}
> >+
> >+\devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Setting Up Receive Buffers}
> >+The device MUST set \field{num_buffers} to the number of descriptors used when
> >+transmitting the  packet.
> >+
> >+The device MUST use only a single descriptor if VIRTIO_VSOCK_F_MRG_RXBUF
> >+is not negotiated.
> >+
> > \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> >-VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> >-sufficient free buffer space for the payload.
> >+For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> >+sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> >+MAY be transmitted when the peer rx buffer is full. Then the packet will be dropped by the peer,
> >+and driver will not get any notification.
> >
> > All packets associated with a stream flow MUST contain valid
> > information in
> > \field{buf_alloc} and \field{fwd_cnt} fields.
> >
> > \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> >-VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> >-sufficient free buffer space for the payload.
> >+For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> >+sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> >+MAY be transmitted when the peer rx buffer is full. Then the packet will be dropped by the peer,
> >+and the device will not get any notification.
> >
> > All packets associated with a stream flow MUST contain valid information in
> > \field{buf_alloc} and \field{fwd_cnt} fields.
> >
> > \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / Device Operation / Receive and Transmit}
> >-The driver queues outgoing packets on the tx virtqueue and incoming packet
> >+The driver queues outgoing packets on the tx virtqueue and allocates incoming packet
> > receive buffers on the rx virtqueue. Packets are of the following form:
> >
> >+If VIRTIO_VSOCK_F_MRG_RXBUF was not negotiated, use the following.
>
> Please use present as in the rest of the document,

Sure. I see both past tense and present tense in the spec, so I was a little bit
confused.

> > \begin{lstlisting}
> > struct virtio_vsock_packet {
> >     struct virtio_vsock_hdr hdr;
> >@@ -195,24 +270,70 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
> > };
> > \end{lstlisting}
> >
> >+Otherwise, use the following form:
>
> Maybe better to repeat:
>
> If VIRTIO_VSOCK_F_MRG_RXBUF is negotiated...
>
OK.

> >+\begin{lstlisting}
> >+struct virtio_vsock_packet_mrg_rxbuf {
> >+    struct virtio_vsock_hdr_mrg_rxbuf hdr;
> >+    u8 data[];
> >+};
> >+\end{lstlisting}
> >+
> >+
> > Virtqueue buffers for outgoing packets are read-only. Virtqueue buffers for
> > incoming packets are write-only.
> >
> >+When transmitting packets to the device, \field{num_buffers} is not
> >used.
> >+
> >+\begin{enumerate}
> >+\item \field{num_buffers} indicates how many descriptors
> >+  this packet is spread over (including this one).
> >+  This is valid only if VIRTIO_VSOCK_F_MRG_RXBUF was negotiated.
> >+  This allows receipt of large packets without having to allocate large
> >+  buffers: a packet that does not fit in a single buffer can flow
> >+  over to the next buffer, and so on. In this case, there will be
> >+  at least \field{num_buffers} used buffers in the virtqueue, and the device
> >+  chains them together to form a single packet in a way similar to
> >+  how it would store it in a single buffer spread over multiple
> >+  descriptors.
> >+  The other buffers will not begin with a struct virtio_vsock_hdr.
> >+
> >+  If VIRTIO_VSOCK_F_MRG_RXBUF was not negotiated, then only one
> >+  descriptor is used.
> >+
> >+\item If
> >+  \field{num_buffers} is one, then the entire packet will be
> >+  contained within this buffer, immediately following the struct
> >+  virtio_vsock_hdr.
> >+\end{enumerate}
> >+
> > \drivernormative{\paragraph}{Device Operation: Receive and
> > Transmit}{Device Types / Socket Device / Device Operation / Receive
> > and Transmit}
> >
> > The \field{guest_cid} configuration field MUST be used as the source CID when
> > sending outgoing packets.
> >
> >-A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> >+For stream and datagram sockets, A VIRTIO_VSOCK_OP_RST reply MUST be
>
> Perhaps we can leave it as it was before, since the `type` field
> identifies the socket type itself.

OK.

> >sent if a packet is received with an
> > unknown \field{type} value.
> >
> > \devicenormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
> >
> > The \field{guest_cid} configuration field MUST NOT contain a reserved CID as listed in \ref{sec:Device Types / Socket Device / Device configuration layout}.
> >
> >-A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> >+For stream and datagram sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > unknown \field{type} value.
>
> Ditto.

OK.

> >
> >+If VIRTIO_VSOCK_F_MRG_RXBUF has been negotiated, the device MUST set
> >+\field{num_buffers} to indicate the number of buffers
> >+the packet (including the header) is spread over.
> >+
> >+If a receive packet is spread over multiple buffers, the device
> >+MUST use all buffers but the last (i.e. the first $\field{num_buffers} -
> >+1$ buffers) completely up to the full length of each buffer
> >+supplied by the driver.
> >+
> >+The device MUST use all buffers used by a single receive
> >+packet together, such that at least \field{num_buffers} are
> >+observed by driver as used.
> >+
> > \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
> >
> > Connections are established by sending a VIRTIO_VSOCK_OP_REQUEST packet. If a
> >@@ -240,6 +361,14 @@ \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device O
> > destination) address tuple for a new connection while the other peer is still
> > processing the old connection.
> >
> >+\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets}
> >+
> >+Datagram (dgram) sockets are connectionless and unreliable. The sender just sends
> >+a message to the peer and hopes it will be delivered. A VIRTIO_VSOCK_OP_RST reply is sent if
> >+a receiving socket does not exist on the destination.
> >+If the transmission or receiving buffers are full, the packets
> >+are dropped.
> >+
> > \subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device Operation / Device Events}
> >
> > Certain events are communicated by the device to the driver using the event
> >--
> >2.11.0
> >
>
> I don't know if maybe it's better to break this patch in two, one where
> we add datagram and one for mergeable buffer.
>
> But let's see what others think.

OK. I can definitely break it to two patches if necessary. Thanks for
all the comments.

Regards,

Jiang

>
> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux