On Mon, Jun 26, 2023 at 05:03:15PM +0200, Stefano Garzarella wrote: > On Fri, Jun 23, 2023 at 04:37:55AM +0000, Bobby Eshleman wrote: > > On Thu, Jun 22, 2023 at 06:09:12PM +0200, Stefano Garzarella wrote: > > > On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote: > > > > Hello Bobby! > > > > > > > > On 10.06.2023 03:58, Bobby Eshleman wrote: > > > > > This commit adds support for datagrams over virtio/vsock. > > > > > > > > > > Message boundaries are preserved on a per-skb and per-vq entry basis. > > > > > > > > I'm a little bit confused about the following case: let vhost sends 4097 bytes > > > > datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each > > > > buffer has attached empty skb to it. Vhost places first 4096 bytes to the first > > > > buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest > > > > has two skb in it rx queue, and user in guest wants to read data - does it read > > > > 4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is > > > > special marker in header which shows where message ends, and how it works here? > > > > > > I think the main difference is that DGRAM is not connection-oriented, so > > > we don't have a stream and we can't split the packet into 2 (maybe we > > > could, but we have no guarantee that the second one for example will be > > > not discarded because there is no space). > > > > > > So I think it is acceptable as a restriction to keep it simple. > > > > > > My only doubt is, should we make the RX buffer size configurable, > > > instead of always using 4k? > > > > > I think that is a really good idea. What mechanism do you imagine? > > Some parameter in sysfs? > I comment more on this below. > > > > For sendmsg() with buflen > VQ_BUF_SIZE, I think I'd like -ENOBUFS > > For the guest it should be easy since it allocates the buffers, but for > the host? > > Maybe we should add a field in the configuration space that reports some > sort of MTU. > > Something in addition to what Laura had proposed here: > https://markmail.org/message/ymhz7wllutdxji3e > That sounds good to me. IIUC vhost exposes the limit via the configuration space, and the guest can configure the RX buffer size up to that limit via sysfs? > > returned even though it is uncharacteristic of Linux sockets. > > Alternatively, silently dropping is okay... but seems needlessly > > unhelpful. > > UDP takes advantage of IP fragmentation, right? > But what happens if a fragment is lost? > > We should try to behave in a similar way. > AFAICT in UDP the sending socket will see EHOSTUNREACH on its error queue and the packet will be dropped. For more details: - the IP defragmenter will emit an ICMP_TIME_EXCEEDED from ip_expire() if the fragment queue is not completed within time. - Upon seeing ICMP_TIME_EXCEEDED, the sending stack will then add EHOSTUNREACH to the socket's error queue, as seen in __udp4_lib_err(). Given some updated man pages I think enqueuing EHOSTUNREACH is okay for vsock too. This also reserves ENOBUFS/ENOMEM only for shortage on local buffers / mem. What do you think? Thanks, Bobby