Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type

"Jiang Wang ." <jiang.wang@xxxxxxxxxxxxx> · Tue, 30 Mar 2021 11:30:50 -0700

On Tue, Mar 30, 2021 at 3:42 AM Stefan Hajnoczi <stefanha@xxxxxxxxxx> wrote:
>
> On Mon, Mar 29, 2021 at 04:22:28PM -0700, Jiang Wang . wrote:
> > On Mon, Mar 29, 2021 at 2:26 AM Stefan Hajnoczi <stefanha@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote:
> > > > I thought about this and discussed it with my colleague Cong Wang.
> > > > One idea is to make current asynchronous send_pkt flow to be synchronous,
> > > > then if the virtqueue is full, the function can return  ENOMEM all the way back
> > > > to the caller and the caller can check the return value of sendmsg
> > > > and slow down when necessary.
> > > >
> > > > In the spec, we can put something like, if the virtqueue is full, the caller
> > > > should be notified with an error etc.
> > > >
> > > > In terms of implementation, that means we will remove the current
> > > > send_pkt_work for both stream and dgram sockets. Currently, the
> > > > code path uses RCU and a work queue, then grab a mutex in the
> > > > work queue function. Since we cannot grab mutex when in rcu
> > > > critical section, we have to change RCU to a normal reference
> > > > counting mechanism. I think this is doable. The drawback is
> > > > that the reference counting in general spends a little more
> > > > cycles than the RCU, so there is a small price to pay. Another
> > > > option is to use Sleepable RCU and remove the work queue.
> > > >
> > > > What do you guys think?
> > >
> > > I think the tx code path is like this because of reliable delivery.
> > > Maybe a separate datagram rx/tx code path would be simpler?
> >
> > I thought about this too.  dgram can have a separate rx/tx
> > path from stream types. In this case, the the_virtio_vsock
> > will still be shared because the virtqueues have to be in one
> > structure. Then the_virtio_vsock will be protected by a rcu
> > and a reference counting ( or a sleepable RCU ).
> >
> > In vhost_vsock_dev_release, it will wait for both rcu and another
> > one to be finished and then free the memory. I think this is
> > doable. Let me know if there is a better way to do it.
> > Btw, I think dgram tx code path will be quite different from
> > stream, but dgram rx path will be similar to stream type.
> >
> > > Take the datagram tx virtqueue lock, try to add the packet into the
> > > virtqueue, and return -ENOBUFS if the virtqueue is full. Then use the
> > > datagram socket's sndbuf accounting to prevent queuing up too many
> > > packets. When a datagram tx virtqueue buffer is used by the device,
> > > select queued packets for transmission.
> >
> > I am not sure about the last sentence. In the new design, there will
> > be no other queued packets for dgram (except those in the virtqueue).
> > When dgram tx virtqueue is freed by the device, the driver side
> > just needs to free some space. No need
> > to trigger more transmission.
>
> This approach drops packets when the virtqueue is full. In order to get
> close to UDP semantics I think the following is necessary:
>
> 1. Packets are placed directly into the tx virtqueue when possible.
> 2. Packets are queued up to sndbuf bytes if the tx virtqueue is full.
> 3. When a full tx virtqueue gets some free space again there is an
>    algorithm that selects from among the queued sockets.
>
> This is how unreliably delivery can be somewhat fair between competing
> threads. I thought you were trying to achieve this (it's similar to what
> UDP does)?

I see. I was thinking something different without an extra queue. I think
your approach is better. For step 3, I think I will just start with a simple
FIFO algorithm first.

> Stefan
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization