Re: Re: [RFC v2] virtio-vsock: add description for datagram type

"Jiang Wang ." <jiang.wang@xxxxxxxxxxxxx> · Tue, 18 May 2021 21:59:43 -0700

On Tue, May 18, 2021 at 6:02 AM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
>
> On Mon, May 17, 2021 at 11:33:06PM -0700, Jiang Wang . wrote:
> >On Mon, May 17, 2021 at 4:02 AM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
> >>
> >> On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote:
> >> >On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
> >> >> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:
> >>
> >> [...]
> >>
> >> >> >I see. I will add some limit to dgram packets. Also, when the
> >> >> >virtqueues
> >> >> >are shared between stream and dgram, both of them need to grab a lock
> >> >> >before using the virtqueue, so one will not completely block another one.
> >> >>
> >> >> I'm not worried about the concurrent access that we definitely need to
> >> >> handle with a lock, but more about the uncontrolled packet sending that
> >> >> dgram might have, flooding the queues and preventing others from
> >> >> communicating.
> >> >
> >> >That is a valid concern. Let me explain how I would handle that if we
> >> >don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list,
> >> >which is similar to send_pkt_list for stream (and seqpacket). But there
> >> >is one difference. The dgram_send_pkt_list has a maximum size setting,
> >> >and keep tracking how many pkts are in the list. The track number
> >> >(dgram_send_pkt_list_size) is  increased when a packet is added
> >> >to the list and is decreased when a packet
> >> >is removed from the list and added to the virtqueue. In
> >> >virtio_transport_send_pkt, if the current
> >> >dgram_send_pkt_list_size is equal
> >> >to the maximum ( let's say 128), then it will not add to the
> >> >dgram_send_pkt_list and return an error to the application.
> >>
> >> For stream socket, we have the send_pkt_list and the send worker because
> >> the virtqueue can be full and the transmitter needs to wait available
> >> slots, because we can't discard packets.
> >>
> >> For dgram I think we don't need this, so we can avoid the
> >> dgram_send_pkt_list and directly enqueue packets in the virtqueue.
> >>
> >> If there are no slots available, we can simply drop the packet.
> >> In this way we don't have to put limits other than the available space
> >> in the virtqueue.
> >
> >I considered this approach before. One question not clear to me is
> >whether we should call virtqueue_kick for every dgram packet. If I
> >am not wrong, each virtqueue_kick will lead to a vm_exit to the host.
> >If we call virtqueue_kick() for each dgram packet, the performance
> >might be bad when there are lots of packets.
>
> Not every virtqueue_kick() will lead to a vm_exit. If the host (see
> vhost_transport_do_send_pkt()) is handling the virtqueue, it disables
> the notification through vhost_disable_notify().

Got it.

> >One idea is to set
> >a threshold and a timer to call virtqueue_kick(). When there are
> >lots of packets, we wait until a threshold. When there are few packets,
> >the timer will make sure those packets will be delivered not too late.
>
> I think is better to keep the host polling a bit if we want to avoid
> some notifications (see vhost_net_busy_poll()).

Sure.

> >
> >Any other ideas for virtqueue_kick? If the threshold plus timer is fine,
> >I can go this direction too.
>
> I think is better to follow vhost_net_busy_poll() approach.

OK. I will follow vhost_net_busy_poll()

> >
> >> >
> >> >In this way, the number of pending dgram pkts to be sent is limited.
> >> >Then both stream and dgram sockets will compete to hold a lock
> >> >for the tx virtqueue. Depending on the linux scheduler, this competition
> >> >will be somewhat fair. As a result, dgram will not block stream completely.
> >> >It will compete and share the virtqueue with stream, but stream
> >> >can still send some pkts.
> >> >
> >> >Basically, the virtqueue becomes a first-come first-serve resource for
> >> >the stream and dgram. When both stream and dgram applications
> >> >have lots of data to send, dgram_send_pkt_list and send_pkt_list
> >> >will still be a limited size, and each will have some chance to send out
> >> >the data via virtqueue.  Does this address your concern?
> >> >
> >> >
> >> >> So having 2 dedicated queues could avoid a credit mechanism at all for
> >> >> connection-less sockets, and simply the receiver discards packets that
> >> >> it can't handle.
> >> >
> >> >With 2 dedicated queues, we still need some kind of accounting for
> >> >the dgram. Like the dgram_send_pkt_size I mentioned above.  Otherwise,
> >> >it will cause OOM. It is not a real credit mechanism, but code is similar
> >> >with or without 2 dedicated queues in my current prototype.
> >>
> >> I think in both cases we don't need an accounting, but we can drop
> >> packets if the virtqueue is full.
> >>
> >> It's still not clear to me where OOM may occur. Can you elaborate on
> >> this?
> >
> >OOM depending on the implementation details. If we keep
> >dgram_send_pkt_list, and put no limitation,  it may increase indefinitely
> >and cause OOM. As long as we have some limitations or drop packets
> >quickly, then we should be fine.
>
> Sure, with the extra list we need some limitations.
>
> >
> >> >
> >> >For receiver discarding packets part, could you explain more? I think
> >> >receiver discarding pkts code also is same with or without 2 dedicated
> >> >queues.
> >>
> >> Yep.
> >>
> >> > Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt);
> >> >to check if a packet should be discarded or not.
> >>
> >> I don't think we can use virtio_transport_inc_rx_pkt() for dgram.
> >> This is based on the credit mechanism (e.g. buf_alloc) that works when
> >> you have a connection oriented socket.
> >>
> >> With dgram you can have multiple transmitter for a single socket, so how
> >> we can exchange that information?
> >>
> >In my view, we don't need to exchange that information. Buf_alloc for
> >dgram is local to a receiving socket. The sending sockets do not know
> >about that. For receiving socket, it just checks if there is still buffer
> >available to decide to drop a packet or not. If multiple transmitter
> >send to a single socket, they will share the same buf_alloc, and packets
> >will be dropped randomly for those transmitters.
>
> I see, but if we reuse skbuff I think we don't need this.
>
> >
> >> If we reuse the VMCI implementation with skbuff, the sk_receive_skb()
> >> already checks if the sk_rcvbuf is full and discards the packet in
> >> this
> >> case, so we don't need an internal rx_queue.
> >
> >OK.
> >
> >> Maybe someday we should convert stream to skbuff too, it would make our
> >> lives easier.
> >>
> >Yeah.  I remember the original virtio vsock patch did use skb, but somehow
> >it  did not use skb anymore when merging to the upstream, not sure why.
>
> Really? I didn't know, then I'll take a look. If you have any links,
> please send :-)

No. I take it back. The skb is only used for dgram, not for stream
sockets. Sorry for the confusion. The code I found is not in a easy-to-read
format. Anyway, here is the link if you are interested:

https://kvm.vger.kernel.narkive.com/BMvH9eEr/rfc-v2-0-7-introduce-vm-sockets-virtio-transport#post1

btw: as major parts of the spec are kind of settled. I will send my current
POC code to the mailing list just to have a reference and see if there are
other major changes are needed. I will add to do's in the cover letter.

Will update spec too.

> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization