Re: [PATCH bpf] xsk: fix race in SKB mode transmit with shared cq

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Thu, 10 Dec 2020 13:52:52 -0800

On Thu, Dec 10, 2020 at 7:36 AM Magnus Karlsson
<magnus.karlsson@xxxxxxxxx> wrote:
>
> From: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
>
> Fix a race when multiple sockets are simultaneously calling sendto()
> when the completion ring is shared in the SKB case. This is the case
> when you share the same netdev and queue id through the
> XDP_SHARED_UMEM bind flag. The problem is that multiple processes can
> be in xsk_generic_xmit() and call the backpressure mechanism in
> xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this
> specific scenario, a race might occur since the rings are
> single-producer single-consumer.
>
> Fix this by moving the tx_completion_lock from the socket to the pool
> as the pool is shared between the sockets that share the completion
> ring. (The pool is not shared when this is not the case.) And then
> protect the accesses to xskq_prod_reserve() with this lock. The
> tx_completion_lock is renamed cq_lock to better reflect that it
> protects accesses to the potentially shared completion ring.
>
> Fixes: 35fcde7f8deb ("xsk: support for Tx")
> Fixes: a9744f7ca200 ("xsk: fix potential race in SKB TX completion code")
> Signed-off-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
> Reported-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
> ---
>  include/net/xdp_sock.h      | 4 ----
>  include/net/xsk_buff_pool.h | 5 +++++
>  net/xdp/xsk.c               | 9 ++++++---
>  net/xdp/xsk_buff_pool.c     | 1 +
>  4 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 4f4e93bf814c..cc17bc957548 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -58,10 +58,6 @@ struct xdp_sock {
>
>         struct xsk_queue *tx ____cacheline_aligned_in_smp;
>         struct list_head tx_list;
> -       /* Mutual exclusion of NAPI TX thread and sendmsg error paths
> -        * in the SKB destructor callback.
> -        */
> -       spinlock_t tx_completion_lock;
>         /* Protects generic receive. */
>         spinlock_t rx_lock;
>
> diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> index 01755b838c74..eaa8386dbc63 100644
> --- a/include/net/xsk_buff_pool.h
> +++ b/include/net/xsk_buff_pool.h
> @@ -73,6 +73,11 @@ struct xsk_buff_pool {
>         bool dma_need_sync;
>         bool unaligned;
>         void *addrs;
> +       /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect:
> +        * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when
> +        * sockets share a single cq when the same netdev and queue id is shared.
> +        */
> +       spinlock_t cq_lock;
>         struct xdp_buff_xsk *free_heads[];
>  };
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 62504471fd20..42cb5f94d49e 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -364,9 +364,9 @@ static void xsk_destruct_skb(struct sk_buff *skb)
>         struct xdp_sock *xs = xdp_sk(skb->sk);
>         unsigned long flags;
>
> -       spin_lock_irqsave(&xs->tx_completion_lock, flags);
> +       spin_lock_irqsave(&xs->pool->cq_lock, flags);
>         xskq_prod_submit_addr(xs->pool->cq, addr);
> -       spin_unlock_irqrestore(&xs->tx_completion_lock, flags);
> +       spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>
>         sock_wfree(skb);
>  }
> @@ -378,6 +378,7 @@ static int xsk_generic_xmit(struct sock *sk)
>         bool sent_frame = false;
>         struct xdp_desc desc;
>         struct sk_buff *skb;
> +       unsigned long flags;
>         int err = 0;
>
>         mutex_lock(&xs->mutex);
> @@ -409,10 +410,13 @@ static int xsk_generic_xmit(struct sock *sk)
>                  * if there is space in it. This avoids having to implement
>                  * any buffering in the Tx path.
>                  */
> +               spin_lock_irqsave(&xs->pool->cq_lock, flags);
>                 if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> +                       spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>                         kfree_skb(skb);
>                         goto out;
>                 }
> +               spin_unlock_irqrestore(&xs->pool->cq_lock, flags);

Lock/unlock for every packet?
Do you have any performance concerns?