Re: mlx5 XDP redirect leaking memory on kernel 6.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 23/05/2023 18.35, Dragos Tatulea wrote:

On Tue, 2023-05-23 at 17:55 +0200, Jesper Dangaard Brouer wrote:

When the mlx5 driver runs an XDP program doing XDP_REDIRECT, then memory
is getting leaked. Other XDP actions, like XDP_DROP, XDP_PASS and XDP_TX
works correctly. I tested both redirecting back out same mlx5 device and
cpumap redirect (with XDP_PASS), which both cause leaking.

After removing the XDP prog, which also cause the page_pool to be
released by mlx5, then the leaks are visible via the page_pool periodic
inflight reports. I have this bpftrace[1] tool that I also use to detect
the problem faster (not waiting 60 sec for a report).

   [1]
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_shutdown01.bt

I've been debugging and reading through the code for a couple of days,
but I've not found the root-cause, yet. I would appreciate new ideas
where to look and fresh eyes on the issue.


To Lin, it looks like mlx5 uses PP_FLAG_PAGE_FRAG, and my current
suspicion is that mlx5 driver doesn't fully release the bias count (hint
see MLX5E_PAGECNT_BIAS_MAX).


Thanks for the report Jesper. Incidentally I've just picked up this issue today
as well.

On XDP redirect and tx, the page is set to skip the bias counter release with
the expectation that page_pool_put_defragged_page will be called from [1]. But,
as I found out now, during XDP redirect only one fragment of the page is
released in xdp core [2]. This is where the leak is coming from.


Ohh, I guess I see the problem now. (As Lin also says indirectly) the
page_pool_put_defragged_page() call is not allowed or not intended to be
invoked directly.

In [1] the driver actually free a PP page that have been fragmented (via
page_pool_fragment_page), but not "defragged" yet.  Meaning
page->pp_frag_count will still be 64 (MLX5E_PAGECNT_BIAS_MAX).

I though about catching this invalid API usage in page_pool, but due to
an (atomic_read) optimization (in page_pool_defrag_page), we cannot
detect this reliably.

We'll provide a fix soon.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c#n665

[2]
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/core/xdp.c#n390





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux