Re: mlx5 XDP redirect leaking memory on kernel 6.3

Yunsheng Lin <linyunsheng@xxxxxxxxxx> · Wed, 24 May 2023 20:43:13 +0800

On 2023/5/24 20:03, Dragos Tatulea wrote:
> On Wed, 2023-05-24 at 19:26 +0800, Yunsheng Lin wrote:
>> On 2023/5/24 0:35, Dragos Tatulea wrote:
>>>
>>> On Tue, 2023-05-23 at 17:55 +0200, Jesper Dangaard Brouer wrote:
>>>>
>>>> When the mlx5 driver runs an XDP program doing XDP_REDIRECT, then memory
>>>> is getting leaked. Other XDP actions, like XDP_DROP, XDP_PASS and XDP_TX
>>>> works correctly. I tested both redirecting back out same mlx5 device and
>>>> cpumap redirect (with XDP_PASS), which both cause leaking.
>>>>
>>>> After removing the XDP prog, which also cause the page_pool to be
>>>> released by mlx5, then the leaks are visible via the page_pool periodic
>>>> inflight reports. I have this bpftrace[1] tool that I also use to detect
>>>> the problem faster (not waiting 60 sec for a report).
>>>>
>>>>   [1] 
>>>> https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_shutdown01.bt
>>>>
>>>> I've been debugging and reading through the code for a couple of days,
>>>> but I've not found the root-cause, yet. I would appreciate new ideas
>>>> where to look and fresh eyes on the issue.
>>>>
>>>>
>>>> To Lin, it looks like mlx5 uses PP_FLAG_PAGE_FRAG, and my current
>>>> suspicion is that mlx5 driver doesn't fully release the bias count (hint
>>>> see MLX5E_PAGECNT_BIAS_MAX).
>>
>> It seems mlx5 is implementing it's own frag allocation scheme, it there a
>> reason why the native frag allocation scheme in page pool is not used? To
>> avoid the "((page->pp_magic & ~0x3UL) == PP_SIGNATURE)" checking?
> 
> mlx5 uses fragmentation of the page from within the driver instead of the pre-
> partitioning of the page using page_pool_alloc_frag(). As shown in commit
> 52cc6ffc0ab2 ("page_pool: Refactor page_pool to enable fragmenting after
> allocation")

page_pool_alloc_frag() API does allow driver to allocate different number of
frag for the same page by specifying different 'size'.

> 
> The exception is however the following optimization:

The below rfc may be able to allow the following optimization using frag API
too.
https://patchwork.kernel.org/project/netdevbpf/cover/20230516124801.2465-1-linyunsheng@xxxxxxxxxx/

> page_pool_put_defragged_page() can be called for XDP_TX directly to avoid the
> overhead of fragment management. That's because mlx5 currently supports only one
> packet per page for XDP.
> 

it seems almost everyone is doing the only one packet per page for XDP, but it is not
very memory saving for ususal case with 1.5K mtu with 4K page size if we reduce the xdp
headroom a little bit, and not to mention the 64K page size case.