Re: [PATCH bpf] xsk: fix corrupted packets for XDP_SHARED_UMEM

Alasdair McWilliam <alasdair.mcwilliam@xxxxxxxxxxx> · Thu, 11 Aug 2022 17:20:30 +0000

Thanks Magnus,

Results on E810 is mixed.

In terms of RSS:

Tested with 8 channels, 4 channels and 2 channels, processing a low volume of packets first (e.g. 8kpps, 4kpps, 2kpps per setup).

* On an 8 channel setup, it is usually channels 0-3 see traffic, but 4-7 do not. (I’ve since observed a test where queues 0-3 and queue 5 see packets, whereas queues 4 and 6-7 do not.

* On a 4 channel setup, I’ve had it where channels 0 and 3 see traffic, but 1 and 2 do not.

* On a 2 channel setup I can’t replicate the behaviour of only 1 channel getting traffic. Both always seem to work.

If a queue does not see any packets on a test run, I see what I think is a proportional increase on ethtool rx_dropped counter.

In a very low PPS test (e.g. 8 channels, 800pps) I can actually observe broadly the right proportion of traffic, to the queues that work. E.G. ~100pps up to the 4096 limit before stall. I don’t know if that’s good enough data to infer that RSS itself is working.

In terms of queue stalls:

Each queue seems to be able to process at most 4096 packets before stalling, and the ethtool rx_dropped counter starts incrementing. (It may have already started incrementing if the queue fails to work at all in the RSS notes above).

Interestingly, sometimes they stall slightly below this - I’ve observed 4091, 4093, 4095 etc. I’d have to do more repeated testing to try find the threshold before it stops servicing the queue, but once it reaches around 4096 or just below, it stops.

I can reproduce this even with a single queue.

Also worth noting this behaviour is seen with and without --poll and --busy-poll. It’s also observed if I don’t specify --zero-copy, but goes away if I force copy mode with --copy. Off the top of my head I’m not sure if the XDP infrastructure will default to XDP_ZEROCOPY if capable and XDP_COPY is not explicitly set on the bind flags though. But, putting it here anyway.

Results on MLX5 (hw is ConnectX-4 Lx)

The patch seems to have resolved issues on MLX and I can reliably rx/tx 2Mpps per queue with xdpsock_multi which is good news. (Same parameters as per testing on E810: poll, busy-poll, zero-copy).

I unfortunately cannot test beyond 2 queues in my current rig as I have to manually program flow-steering rules for 2 source MAC addresses into 2 different channels, and I only have 2 source ports on my load generator!

I could see if someone can generate a test pattern that comes from 8x source IPs, and flow steer each IP into its own queue, but I’m not sure this is useful re. the comment on RSS above. Let me know?

I will do some more thorough testing on the E810/ICE setup tomorrow to see if I can get a change in behaviour or observe what the exact stall threshold is.

Thanks
Alasdair

> On 11 Aug 2022, at 12:55, Magnus Karlsson <magnus.karlsson@xxxxxxxxx> wrote:
> 
> From: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
> 
> Fix an issue in XDP_SHARED_UMEM mode together with aligned mode were
> packets are corrupted for the second and any further sockets bound to
> the same umem. In other words, this does not affect the first socket
> bound to the umem. The culprit for this bug is that the initialization
> of the DMA addresses for the pre-populated xsk buffer pool entries was
> not performed for any socket but the first one bound to the umem. Only
> the linear array of DMA addresses was populated. Fix this by
> populating the DMA addresses in the xsk buffer pool for every socket
> bound to the same umem.
> 
> Fixes: 94033cd8e73b8 ("xsk: Optimize for aligned case")
> Reported-by: Alasdair McWilliam <alasdair.mcwilliam@xxxxxxxxxxx>
> Tested-by: Alasdair McWilliam <alasdair.mcwilliam@xxxxxxxxxxx>
> Link: https://lore.kernel.org/xdp-newbies/6205E10C-292E-4995-9D10-409649354226@xxxxxxxxxxx/
> Signed-off-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
> ---
> net/xdp/xsk_buff_pool.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index f70112176b7c..9b09da63a7c3 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -379,6 +379,14 @@ static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map)
> 
> static int xp_init_dma_info(struct xsk_buff_pool *pool, struct xsk_dma_map *dma_map)
> {
> +	u32 i;
> +
> +	for (i = 0; i < pool->heads_cnt; i++) {
> +		struct xdp_buff_xsk *xskb = &pool->heads[i];
> +
> +		xp_init_xskb_dma(xskb, pool, dma_map->dma_pages, xskb->orig_addr);
> +	}
> +
> 	pool->dma_pages = kvcalloc(dma_map->dma_pages_cnt, sizeof(*pool->dma_pages), GFP_KERNEL);
> 	if (!pool->dma_pages)
> 		return -ENOMEM;
> @@ -428,12 +436,6 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
> 
> 	if (pool->unaligned)
> 		xp_check_dma_contiguity(dma_map);
> -	else
> -		for (i = 0; i < pool->heads_cnt; i++) {
> -			struct xdp_buff_xsk *xskb = &pool->heads[i];
> -
> -			xp_init_xskb_dma(xskb, pool, dma_map->dma_pages, xskb->orig_addr);
> -		}
> 
> 	err = xp_init_dma_info(pool, dma_map);
> 	if (err) {
> 
> base-commit: 46c8229c4317ba8576a206d285a34783390ba6ab
> -- 
> 2.34.1
>