Re: [PATCH] net: ethernet: ti: am65-cpsw: Add minimal XDP support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/29/24 17:46, Andrew Lunn wrote:
On Thu, Feb 29, 2024 at 05:19:44PM +0100, Julien Panis wrote:
On 2/27/24 00:18, Andrew Lunn wrote:
+static struct sk_buff *am65_cpsw_alloc_skb(struct net_device *ndev, unsigned int len)
+{
+	struct page *page;
+	struct sk_buff *skb;
+
+	page = dev_alloc_pages(0);
You are likely to get better performance if you use the page_pool.

When FEC added XDP support, the first set of changes was to make use
of page_pool. That improved the drivers performance. Then XDP was
added on top. Maybe you can follow that pattern.

        Andrew
Hello Andrew,

Thanks for this suggestion. I've been working on it over the last days.
I encountered issues and I begin to wonder if that's a good option for
this driver. Let me explain...
I'm not a page pool expert, so hopefully those that are will jump in
and help.

This device has 3 ports:
- Port0 is the host port (internal to the subsystem and referred as CPPI
in the driver).
- Port1 and 2 are the external ethernet ports.
Each port's RX FIFO has 1 queue.

As mentioned in page_pool documentation:
https://docs.kernel.org/networking/page_pool.html
"The number of pools created MUST match the number of hardware
queues unless hardware restrictions make that impossible. This would
otherwise beat the purpose of page pool, which is allocate pages fast
from cache without locking."
My guess is, this last bit is the important part. Locking. Do ports 1
and port 2 rx and tx run in parallel on different CPUs? Hence do you
need locking?

No.


So, for this driver I should use 2 page pools (for port1 and 2) if possible.
Maybe, maybe not. It is not really the number of front panel interface
which matters here. It is the number of entities which need buffers.

But, if I I replace any alloc_skb() with page_pool_alloc() in the original
driver, I will have to create only 1 page_pool.
This is visible in am65_cpsw_nuss_common_open(), which starts with:
"if (common->usage_count) return 0;" to ensure that subsequent code
will be executed only once even if the 2 interfaces are ndo_open'd.
IOW, skbs will be allocated for only 1 RX channel. I checked that behavior,
and that's the way it works.
This is because the host port (CPPI) has only 1 RX queue. This single
queue is used to get all the packets, from both Ports and 2 (port ID is
stored in each descriptor).
So you have one entity which needs buffers. CPPI. It seems like Port1
and Port2 do not need buffers? So to me, you need one page pool.

Yes, only one entity (CPPI) needs buffers.


So, because of this constraint, I ended up working on the "single
page pool" option.

Questions:
1) Is the behavior described above usual for ethernet switch devices ?
This might be the first time page pool has been used by a switch? I
would check the melanox and sparx5 driver and see if they use page
pool.

It seems that sparx5 does not use page pools, mellanox does.


2) Given that I can't use a page pool for each HW queue, is it worth
using the page pool memory model ?
3) Currently I use 2 xdp_rxq (one for port1 and another one for port2).
If an XDP program is attached to port1, my page pool dma parameter
will need to be DMA_BIDIRECTIONAL (because of XDP_TX). As a result,
even if there is no XDP program attached to port2, the setting for page
pool will need to be DMA_BIDIRECTIONAL instead of DMA_FROM_DEVICE.
In such situation, isn't it a problem for port2 ?
4) Because of 1) and 2), will page pool performance really be better for
this driver ?
You probably need to explain the TX architecture a bit. How are
packets passed to the hardware? Is it again via a single CPPI entity?
Or are there two entities?

Yes, packets are passed to the hardware via a single CPPI entity.

DMA_BIDIRECTIONAL and DMA_FROM_DEVICE is about cache flushing and
invalidation. Before you pass a buffer to the hardware for it to
receive into, you need to invalidate the cache. That means when the
hardware gives the buffer back with a packet in it, there is a cache
miss and it fetches the new data from memory. If that packet gets
turned into an XDP_TX, you need to flush the cache to force any
changes out of the cache and into memory. The DMA from memory then
gets the up to date packet contents.

My guess would be, always using DMA_BIDIRECTIONAL is fine, so long as
your cache operations are correct. Turn on DMA_API_DEBUG and make sure
it is happy.

      Andrew

Thank you for all these explanations.
I'll carry on working on this single page pool option, so.

Julien



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux