On Fri, Dec 11, 2020 at 8:47 AM Magnus Karlsson <magnus.karlsson@xxxxxxxxx> wrote: > > On Thu, Dec 10, 2020 at 10:53 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Thu, Dec 10, 2020 at 7:36 AM Magnus Karlsson > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > From: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > > > > > > Fix a race when multiple sockets are simultaneously calling sendto() > > > when the completion ring is shared in the SKB case. This is the case > > > when you share the same netdev and queue id through the > > > XDP_SHARED_UMEM bind flag. The problem is that multiple processes can > > > be in xsk_generic_xmit() and call the backpressure mechanism in > > > xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this > > > specific scenario, a race might occur since the rings are > > > single-producer single-consumer. > > > > > > Fix this by moving the tx_completion_lock from the socket to the pool > > > as the pool is shared between the sockets that share the completion > > > ring. (The pool is not shared when this is not the case.) And then > > > protect the accesses to xskq_prod_reserve() with this lock. The > > > tx_completion_lock is renamed cq_lock to better reflect that it > > > protects accesses to the potentially shared completion ring. > > > > > > Fixes: 35fcde7f8deb ("xsk: support for Tx") > > > Fixes: a9744f7ca200 ("xsk: fix potential race in SKB TX completion code") > > > Signed-off-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > > > Reported-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> > > > --- > > > include/net/xdp_sock.h | 4 ---- > > > include/net/xsk_buff_pool.h | 5 +++++ > > > net/xdp/xsk.c | 9 ++++++--- > > > net/xdp/xsk_buff_pool.c | 1 + > > > 4 files changed, 12 insertions(+), 7 deletions(-) > > > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > > index 4f4e93bf814c..cc17bc957548 100644 > > > --- a/include/net/xdp_sock.h > > > +++ b/include/net/xdp_sock.h > > > @@ -58,10 +58,6 @@ struct xdp_sock { > > > > > > struct xsk_queue *tx ____cacheline_aligned_in_smp; > > > struct list_head tx_list; > > > - /* Mutual exclusion of NAPI TX thread and sendmsg error paths > > > - * in the SKB destructor callback. > > > - */ > > > - spinlock_t tx_completion_lock; > > > /* Protects generic receive. */ > > > spinlock_t rx_lock; > > > > > > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h > > > index 01755b838c74..eaa8386dbc63 100644 > > > --- a/include/net/xsk_buff_pool.h > > > +++ b/include/net/xsk_buff_pool.h > > > @@ -73,6 +73,11 @@ struct xsk_buff_pool { > > > bool dma_need_sync; > > > bool unaligned; > > > void *addrs; > > > + /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect: > > > + * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when > > > + * sockets share a single cq when the same netdev and queue id is shared. > > > + */ > > > + spinlock_t cq_lock; > > > struct xdp_buff_xsk *free_heads[]; > > > }; > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > > index 62504471fd20..42cb5f94d49e 100644 > > > --- a/net/xdp/xsk.c > > > +++ b/net/xdp/xsk.c > > > @@ -364,9 +364,9 @@ static void xsk_destruct_skb(struct sk_buff *skb) > > > struct xdp_sock *xs = xdp_sk(skb->sk); > > > unsigned long flags; > > > > > > - spin_lock_irqsave(&xs->tx_completion_lock, flags); > > > + spin_lock_irqsave(&xs->pool->cq_lock, flags); > > > xskq_prod_submit_addr(xs->pool->cq, addr); > > > - spin_unlock_irqrestore(&xs->tx_completion_lock, flags); > > > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > > > > > sock_wfree(skb); > > > } > > > @@ -378,6 +378,7 @@ static int xsk_generic_xmit(struct sock *sk) > > > bool sent_frame = false; > > > struct xdp_desc desc; > > > struct sk_buff *skb; > > > + unsigned long flags; > > > int err = 0; > > > > > > mutex_lock(&xs->mutex); > > > @@ -409,10 +410,13 @@ static int xsk_generic_xmit(struct sock *sk) > > > * if there is space in it. This avoids having to implement > > > * any buffering in the Tx path. > > > */ > > > + spin_lock_irqsave(&xs->pool->cq_lock, flags); > > > if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { > > > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > > kfree_skb(skb); > > > goto out; > > > } > > > + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > > > Lock/unlock for every packet? > > Do you have any performance concerns? > > I have measured and the performance impact of this is in the noise. > There is unfortunately already a lot of code being executed per packet > in this path. On the positive side, once this bug fix trickles down to > bpf-next, I will submit a rather simple patch to this function that > improves throughput by 15% for the txpush microbenchmark. This will > more than compensate for any locking introduced. Please ignore/drop this patch. I will include this one as patch 1 in a patch set of 2 that I will submit. The second patch will be a fix to the reservation problem spotted by Xuan and it needs the locking of the first patch to be able to work.