On Thu, Dec 10, 2020 at 10:53 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Thu, Dec 10, 2020 at 7:36 AM Magnus Karlsson > <magnus.karlsson@xxxxxxxxx> wrote: > > > > From: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > > > > Fix a race when multiple sockets are simultaneously calling sendto() > > when the completion ring is shared in the SKB case. This is the case > > when you share the same netdev and queue id through the > > XDP_SHARED_UMEM bind flag. The problem is that multiple processes can > > be in xsk_generic_xmit() and call the backpressure mechanism in > > xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this > > specific scenario, a race might occur since the rings are > > single-producer single-consumer. > > > > Fix this by moving the tx_completion_lock from the socket to the pool > > as the pool is shared between the sockets that share the completion > > ring. (The pool is not shared when this is not the case.) And then > > protect the accesses to xskq_prod_reserve() with this lock. The > > tx_completion_lock is renamed cq_lock to better reflect that it > > protects accesses to the potentially shared completion ring. > > > > Fixes: 35fcde7f8deb ("xsk: support for Tx") > > Fixes: a9744f7ca200 ("xsk: fix potential race in SKB TX completion code") > > Signed-off-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > > Reported-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> > > --- > > include/net/xdp_sock.h | 4 ---- > > include/net/xsk_buff_pool.h | 5 +++++ > > net/xdp/xsk.c | 9 ++++++--- > > net/xdp/xsk_buff_pool.c | 1 + > > 4 files changed, 12 insertions(+), 7 deletions(-) > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > index 4f4e93bf814c..cc17bc957548 100644 > > --- a/include/net/xdp_sock.h > > +++ b/include/net/xdp_sock.h > > @@ -58,10 +58,6 @@ struct xdp_sock { > > > > struct xsk_queue *tx ____cacheline_aligned_in_smp; > > struct list_head tx_list; > > - /* Mutual exclusion of NAPI TX thread and sendmsg error paths > > - * in the SKB destructor callback. > > - */ > > - spinlock_t tx_completion_lock; > > /* Protects generic receive. */ > > spinlock_t rx_lock; > > > > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h > > index 01755b838c74..eaa8386dbc63 100644 > > --- a/include/net/xsk_buff_pool.h > > +++ b/include/net/xsk_buff_pool.h > > @@ -73,6 +73,11 @@ struct xsk_buff_pool { > > bool dma_need_sync; > > bool unaligned; > > void *addrs; > > + /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect: > > + * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when > > + * sockets share a single cq when the same netdev and queue id is shared. > > + */ > > + spinlock_t cq_lock; > > struct xdp_buff_xsk *free_heads[]; > > }; > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index 62504471fd20..42cb5f94d49e 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -364,9 +364,9 @@ static void xsk_destruct_skb(struct sk_buff *skb) > > struct xdp_sock *xs = xdp_sk(skb->sk); > > unsigned long flags; > > > > - spin_lock_irqsave(&xs->tx_completion_lock, flags); > > + spin_lock_irqsave(&xs->pool->cq_lock, flags); > > xskq_prod_submit_addr(xs->pool->cq, addr); > > - spin_unlock_irqrestore(&xs->tx_completion_lock, flags); > > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > > > sock_wfree(skb); > > } > > @@ -378,6 +378,7 @@ static int xsk_generic_xmit(struct sock *sk) > > bool sent_frame = false; > > struct xdp_desc desc; > > struct sk_buff *skb; > > + unsigned long flags; > > int err = 0; > > > > mutex_lock(&xs->mutex); > > @@ -409,10 +410,13 @@ static int xsk_generic_xmit(struct sock *sk) > > * if there is space in it. This avoids having to implement > > * any buffering in the Tx path. > > */ > > + spin_lock_irqsave(&xs->pool->cq_lock, flags); > > if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { > > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > kfree_skb(skb); > > goto out; > > } > > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > Lock/unlock for every packet? > Do you have any performance concerns? I have measured and the performance impact of this is in the noise. There is unfortunately already a lot of code being executed per packet in this path. On the positive side, once this bug fix trickles down to bpf-next, I will submit a rather simple patch to this function that improves throughput by 15% for the txpush microbenchmark. This will more than compensate for any locking introduced.