Re: Bad XDP performance with mlx5

Jesper Dangaard Brouer <brouer@xxxxxxxxxx> · Fri, 31 May 2019 23:57:53 +0200

On Fri, 31 May 2019 18:06:01 +0000 Saeed Mahameed <saeedm@xxxxxxxxxxxx> wrote:

> On Fri, 2019-05-31 at 18:18 +0200, Jesper Dangaard Brouer wrote:
[...]
> > 
> > To understand why this is happening, you first have to know that the
> > difference is between the two RX-memory modes used by mlx5 for non-
> > XDP vs XDP. With non-XDP two frames are stored per memory-page,
> > while for XDP only a single frame per page is used.  The packets
> > available in the RX- rings are  actually the same, as the ring
> > sizes are non-XDP=512 vs. XDP=1024. 
> 
> Thanks Jesper ! this was a well put together explanation.
> I want to point out that some other drivers are using alloc_skb APIs
> which provide a good caching mechanism, which is even better than the
> mlx5 internal one (which uses the alloc_page APIs directly), this can
> explain the difference, and your explanation shows the root cause of
> the higher cpu util with XDP on mlx5, since the mlx5 page cache works
> with half of its capacity when enabling XDP.
> 
> Now do we really need to keep this page per packet in mlx5 when XDP is
> enabled ? i think it is time to drop that .. 

No, we need to keep the page per packet (at least until, I've solved
some corner-cases with page_pool, which could likely require getting a
page-flag).

> > I believe, the real issue is that TCP use the SKB->truesize (based
> > on frame size) for different memory pressure and window
> > calculations, which is why it solved the issue to increase the
> > window size manually. 

The TCP performance issue is not solely a SKB->truesize issue, but also
an issue with how the driver level page-cache works.  It is actually
very fragile, as single page with elevated refcnt can block the cache
(see mlx5e_rx_cache_get()).  Which easily happens with TCP packets
that is waiting to be re-transmitted in-case of loss.  This is
happening here, as indicated by the rx_cache_busy and rx_cache_full
being the same.

We (Ilias, Tariq and I) have been planning to remove this small driver
cache, and instead use the page_pool, and create a page-return path for
SKBs.  Which should make this problem go away.  I'm going to be working
on this the next couple of weeks (the tricky part is all the corner
cases).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

On Fri, 31 May 2019 18:18:17 +0200
Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote:

> It was clear that the mlx5 driver page-cache was not working:
>  Ethtool(mlx5p1  ) stat:     6653761 (   6,653,761) <= rx_cache_busy /sec
>  Ethtool(mlx5p1  ) stat:     6653732 (   6,653,732) <= rx_cache_full /sec
>  Ethtool(mlx5p1  ) stat:      669481 (     669,481) <= rx_cache_reuse /sec
>  Ethtool(mlx5p1  ) stat:           1 (           1) <= rx_congst_umr /sec
>  Ethtool(mlx5p1  ) stat:     7323230 (   7,323,230) <= rx_csum_unnecessary /sec
>  Ethtool(mlx5p1  ) stat:        1034 (       1,034) <= rx_discards_phy /sec
>  Ethtool(mlx5p1  ) stat:     7323230 (   7,323,230) <= rx_packets /sec
>  Ethtool(mlx5p1  ) stat:     7324244 (   7,324,244) <= rx_packets_phy /sec