Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff

Bobby Eshleman <bobbyeshleman@xxxxxxxxx> · Mon, 21 Nov 2022 12:01:12 +0000

On Tue, Dec 06, 2022 at 11:20:21AM +0100, Paolo Abeni wrote:
> Hello,
> 
> On Fri, 2022-12-02 at 09:35 -0800, Bobby Eshleman wrote:
> [...]
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 35d7eedb5e8e..6c0b2d4da3fe 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -3,10 +3,129 @@
> >  #define _LINUX_VIRTIO_VSOCK_H
> >  
> >  #include <uapi/linux/virtio_vsock.h>
> > +#include <linux/bits.h>
> >  #include <linux/socket.h>
> >  #include <net/sock.h>
> >  #include <net/af_vsock.h>
> >  
> > +#define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
> > +
> > +enum virtio_vsock_skb_flags {
> > +	VIRTIO_VSOCK_SKB_FLAGS_REPLY		= BIT(0),
> > +	VIRTIO_VSOCK_SKB_FLAGS_TAP_DELIVERED	= BIT(1),
> > +};
> > +
> > +static inline struct virtio_vsock_hdr *virtio_vsock_hdr(struct sk_buff *skb)
> > +{
> > +	return (struct virtio_vsock_hdr *)skb->head;
> > +}
> > +
> > +static inline bool virtio_vsock_skb_reply(struct sk_buff *skb)
> > +{
> > +	return skb->_skb_refdst & VIRTIO_VSOCK_SKB_FLAGS_REPLY;
> > +}
> 
> I'm sorry for the late feedback. The above is extremelly risky: if the
> skb will land later into the networking stack, we could experience the
> most difficult to track bugs.
> 
> You should use the skb control buffer instead (skb->cb), with the
> additional benefit you could use e.g. bool - the compiler could emit
> better code to manipulate such fields - and you will not need to clear
> the field before release nor enqueue.
> 
> [...]
> 

Hey Paolo, thank you for the review. For my own learning, this would
happen presumably when the skb is dropped? And I assume we don't see
this in sockmap because it is always cleared before leaving sockmap's
hands? I sanity checked this patch with an out-of-tree patch I have that
uses the networking stack, but I suspect I didn't see issues because my
test harness didn't induce dropping...

I originally avoided skb->cb because the reply flag is set at allocation
and would potentially be clobbered by a pass through the networking
stack. The reply flag would be used after a pass through the networking
stack (e.g., during transmission at the device level and when sockets
close while skbs are still queued for xmit).

I suppose using skb->cb would look like something like this:
- use skb_clone() for reply skbs
- set reply on the cloned sk_buff skb->cb
- keep a hashmap mapping original skb to cloned skb (is there a better
  way?)
- when choosing to apply reply logic, if skb_cloned() refer to the
  hashmap

Is there a better/simpler way to maintain skb->cb?

> > @@ -352,37 +360,38 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> >  				   size_t len)
> >  {
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > -	struct virtio_vsock_pkt *pkt;
> >  	size_t bytes, total = 0;
> > -	u32 free_space;
> > +	struct sk_buff *skb;
> >  	int err = -EFAULT;
> > +	u32 free_space;
> >  
> >  	spin_lock_bh(&vvs->rx_lock);
> > -	while (total < len && !list_empty(&vvs->rx_queue)) {
> > -		pkt = list_first_entry(&vvs->rx_queue,
> > -				       struct virtio_vsock_pkt, list);
> > +	while (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) {
> > +		skb = __skb_dequeue(&vvs->rx_queue);
> 
> Here the locking schema is confusing. It looks like vvs->rx_queue is
> under vvs->rx_lock protection, so the above should be skb_queue_empty()
> instead of the lockless variant.
> 
> [...]
> 
> > @@ -858,16 +873,11 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> >  static void virtio_transport_remove_sock(struct vsock_sock *vsk)
> >  {
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > -	struct virtio_vsock_pkt *pkt, *tmp;
> >  
> >  	/* We don't need to take rx_lock, as the socket is closing and we are
> >  	 * removing it.
> >  	 */
> > -	list_for_each_entry_safe(pkt, tmp, &vvs->rx_queue, list) {
> > -		list_del(&pkt->list);
> > -		virtio_transport_free_pkt(pkt);
> > -	}
> > -
> > +	virtio_vsock_skb_queue_purge(&vvs->rx_queue);
> 
> Still assuming rx_queue is under the rx_lock, given you don't need the
> locking here as per the above comment, you should use the lockless
> purge variant.
> 

Good catch, thanks!

Thanks again,
Bobby