On 2023/5/24 0:35, Dragos Tatulea wrote: > > On Tue, 2023-05-23 at 17:55 +0200, Jesper Dangaard Brouer wrote: >> >> When the mlx5 driver runs an XDP program doing XDP_REDIRECT, then memory >> is getting leaked. Other XDP actions, like XDP_DROP, XDP_PASS and XDP_TX >> works correctly. I tested both redirecting back out same mlx5 device and >> cpumap redirect (with XDP_PASS), which both cause leaking. >> >> After removing the XDP prog, which also cause the page_pool to be >> released by mlx5, then the leaks are visible via the page_pool periodic >> inflight reports. I have this bpftrace[1] tool that I also use to detect >> the problem faster (not waiting 60 sec for a report). >> >> [1] >> https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_shutdown01.bt >> >> I've been debugging and reading through the code for a couple of days, >> but I've not found the root-cause, yet. I would appreciate new ideas >> where to look and fresh eyes on the issue. >> >> >> To Lin, it looks like mlx5 uses PP_FLAG_PAGE_FRAG, and my current >> suspicion is that mlx5 driver doesn't fully release the bias count (hint >> see MLX5E_PAGECNT_BIAS_MAX). It seems mlx5 is implementing it's own frag allocation scheme, it there a reason why the native frag allocation scheme in page pool is not used? To avoid the "((page->pp_magic & ~0x3UL) == PP_SIGNATURE)" checking? >> > > Thanks for the report Jesper. Incidentally I've just picked up this issue today > as well. > > On XDP redirect and tx, the page is set to skip the bias counter release with > the expectation that page_pool_put_defragged_page will be called from [1]. But, page_pool_put_defragged_page() can only be called when there is only user using the page, I am not sure how it can ensure that yet. > as I found out now, during XDP redirect only one fragment of the page is > released in xdp core [2]. This is where the leak is coming from. > > We'll provide a fix soon. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c#n665 > > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/core/xdp.c#n390 > > Thanks, > Dragos > >