On Tue, Dec 06, 2022 at 11:20:21AM +0100, Paolo Abeni wrote: > Hello, > > On Fri, 2022-12-02 at 09:35 -0800, Bobby Eshleman wrote: > [...] > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h > > index 35d7eedb5e8e..6c0b2d4da3fe 100644 > > --- a/include/linux/virtio_vsock.h > > +++ b/include/linux/virtio_vsock.h > > @@ -3,10 +3,129 @@ > > #define _LINUX_VIRTIO_VSOCK_H > > > > #include <uapi/linux/virtio_vsock.h> > > +#include <linux/bits.h> > > #include <linux/socket.h> > > #include <net/sock.h> > > #include <net/af_vsock.h> > > > > +#define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr)) > > + > > +enum virtio_vsock_skb_flags { > > + VIRTIO_VSOCK_SKB_FLAGS_REPLY = BIT(0), > > + VIRTIO_VSOCK_SKB_FLAGS_TAP_DELIVERED = BIT(1), > > +}; > > + > > +static inline struct virtio_vsock_hdr *virtio_vsock_hdr(struct sk_buff *skb) > > +{ > > + return (struct virtio_vsock_hdr *)skb->head; > > +} > > + > > +static inline bool virtio_vsock_skb_reply(struct sk_buff *skb) > > +{ > > + return skb->_skb_refdst & VIRTIO_VSOCK_SKB_FLAGS_REPLY; > > +} > > I'm sorry for the late feedback. The above is extremelly risky: if the > skb will land later into the networking stack, we could experience the > most difficult to track bugs. > > You should use the skb control buffer instead (skb->cb), with the > additional benefit you could use e.g. bool - the compiler could emit > better code to manipulate such fields - and you will not need to clear > the field before release nor enqueue. > > [...] > Hey Paolo, thank you for the review. For my own learning, this would happen presumably when the skb is dropped? And I assume we don't see this in sockmap because it is always cleared before leaving sockmap's hands? I sanity checked this patch with an out-of-tree patch I have that uses the networking stack, but I suspect I didn't see issues because my test harness didn't induce dropping... I originally avoided skb->cb because the reply flag is set at allocation and would potentially be clobbered by a pass through the networking stack. The reply flag would be used after a pass through the networking stack (e.g., during transmission at the device level and when sockets close while skbs are still queued for xmit). I suppose using skb->cb would look like something like this: - use skb_clone() for reply skbs - set reply on the cloned sk_buff skb->cb - keep a hashmap mapping original skb to cloned skb (is there a better way?) - when choosing to apply reply logic, if skb_cloned() refer to the hashmap Is there a better/simpler way to maintain skb->cb? > > @@ -352,37 +360,38 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > > size_t len) > > { > > struct virtio_vsock_sock *vvs = vsk->trans; > > - struct virtio_vsock_pkt *pkt; > > size_t bytes, total = 0; > > - u32 free_space; > > + struct sk_buff *skb; > > int err = -EFAULT; > > + u32 free_space; > > > > spin_lock_bh(&vvs->rx_lock); > > - while (total < len && !list_empty(&vvs->rx_queue)) { > > - pkt = list_first_entry(&vvs->rx_queue, > > - struct virtio_vsock_pkt, list); > > + while (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) { > > + skb = __skb_dequeue(&vvs->rx_queue); > > Here the locking schema is confusing. It looks like vvs->rx_queue is > under vvs->rx_lock protection, so the above should be skb_queue_empty() > instead of the lockless variant. > > [...] > > > @@ -858,16 +873,11 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, > > static void virtio_transport_remove_sock(struct vsock_sock *vsk) > > { > > struct virtio_vsock_sock *vvs = vsk->trans; > > - struct virtio_vsock_pkt *pkt, *tmp; > > > > /* We don't need to take rx_lock, as the socket is closing and we are > > * removing it. > > */ > > - list_for_each_entry_safe(pkt, tmp, &vvs->rx_queue, list) { > > - list_del(&pkt->list); > > - virtio_transport_free_pkt(pkt); > > - } > > - > > + virtio_vsock_skb_queue_purge(&vvs->rx_queue); > > Still assuming rx_queue is under the rx_lock, given you don't need the > locking here as per the above comment, you should use the lockless > purge variant. > Good catch, thanks! Thanks again, Bobby