Re: [PATCH v5 net-next 1/3] net: introduce page_pool pointer in softnet_data percpu struct

Jesper Dangaard Brouer <hawk@xxxxxxxxxx> · Wed, 20 Dec 2023 13:00:42 +0100

On 19/12/2023 16.23, Paolo Abeni wrote:
On Thu, 2023-12-14 at 15:29 +0100, Lorenzo Bianconi wrote:
Allocate percpu page_pools in softnet_data.
Moreover add cpuid filed in page_pool struct in order to recycle the
page in the page_pool "hot" cache if napi_pp_put_page() is running on
the same cpu.
This is a preliminary patch to add xdp multi-buff support for xdp running
in generic mode.

Signed-off-by: Lorenzo Bianconi <lorenzo@xxxxxxxxxx>
---
  include/linux/netdevice.h       |  1 +
  include/net/page_pool/helpers.h |  5 +++++
  include/net/page_pool/types.h   |  1 +
  net/core/dev.c                  | 39 ++++++++++++++++++++++++++++++++-
  net/core/page_pool.c            |  5 +++++
  net/core/skbuff.c               |  5 +++--
  6 files changed, 53 insertions(+), 3 deletions(-)

@Jesper, @Ilias: could you please have a look at the pp bits?

I have some concerns... I'm still entertaining the idea, but we need to
be aware of the tradeoffs we are making.

(1)
Adding PP to softnet_data means per CPU caching 256 pages in the
ptr_ring (plus likely 64 in the alloc-cache).   Fortunately, PP starts
out empty, so as long as this PP isn't used they don't get cached. But
if used, then PP don't have a MM shrinker that removes these cached
pages, in case system is under MM pressure.  I guess, you can argue that
keeping this per netdev rx-queue would make memory usage even higher.
This is a tradeoff, we are trading memory (waste) for speed.

(2) (Question to Jakub I guess)
How does this connect with Jakub's PP netlink stats interface?
E.g. I find it very practical that this allow us get PP stats per
netdev, but in this case there isn't a netdev.

(3) (Implicit locking)
PP have lockless "alloc" because it it relies on drivers NAPI context.
The places where netstack access softnet_data provide similar protection
that we can rely on for PP, so this is likely correct implementation
wise.  But it will give people like Sebastian (Cc) more gray hair when
figuring out how PREEMPT_RT handle these cases.

(4)
The optimization is needed for the case where we need to re-allocate and
copy SKB fragments.  I think we should focus on avoiding this code path,
instead of optimizing it.  For UDP it should be fairly easy, but for TCP
this is harder.

--Jesper