On 16/06/2023 19.34, Alexander Duyck wrote:
On Fri, Jun 16, 2023 at 9:31 AM Jesper Dangaard Brouer
<jbrouer@xxxxxxxxxx> wrote:
On 16/06/2023 13.57, Yunsheng Lin wrote:
On 2023/6/16 0:19, Jesper Dangaard Brouer wrote:
...
You have mentioned veth as the use-case. I know I acked adding page_pool
use-case to veth, for when we need to convert an SKB into an
xdp_buff/xdp-frame, but maybe it was the wrong hammer(?).
In this case in veth, the size is known at the page allocation time.
Thus, using the page_pool API is wasting memory. We did this for
performance reasons, but we are not using PP for what is was intended
for. We mostly use page_pool, because it an existing recycle return
path, and we were too lazy to add another alloc-type (see enum
xdp_mem_type).
Maybe you/we can extend veth to use this dynamic size API, to show us
that this is API is a better approach. I will signup for benchmarking
this (and coordinating with CC Maryam as she came with use-case we
improved on).
Thanks, let's find out if page pool is the right hammer for the
veth XDP case.
Below is the change for veth using the new api in this patch.
Only compile test as I am not familiar enough with veth XDP and
testing environment for it.
Please try it if it is helpful.
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 614f3e3efab0..8850394f1d29 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -736,7 +736,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
if (skb_shared(skb) || skb_head_is_locked(skb) ||
skb_shinfo(skb)->nr_frags ||
skb_headroom(skb) < XDP_PACKET_HEADROOM) {
- u32 size, len, max_head_size, off;
+ u32 size, len, max_head_size, off, truesize, page_offset;
struct sk_buff *nskb;
struct page *page;
int i, head_off;
@@ -752,12 +752,15 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
goto drop;
+ size = min_t(u32, skb->len, max_head_size);
+ truesize = size;
+
/* Allocate skb head */
- page = page_pool_dev_alloc_pages(rq->page_pool);
+ page = page_pool_dev_alloc(rq->page_pool, &page_offset, &truesize);
Maybe rename API to:
addr = netmem_alloc(rq->page_pool, &truesize);
if (!page)
goto drop;
- nskb = napi_build_skb(page_address(page), PAGE_SIZE);
+ nskb = napi_build_skb(page_address(page) + page_offset, truesize);
IMHO this illustrates that API is strange/funky.
(I think this is what Alex Duyck is also pointing out).
This is the memory (virtual) address "pointer":
addr = page_address(page) + page_offset
This is what napi_build_skb() takes as input. (I looked at other users
of napi_build_skb() whom all give a mem ptr "va" as arg.)
So, why does your new API provide the "page" and not just the address?
As proposed above:
addr = netmem_alloc(rq->page_pool, &truesize);
Maybe the API should be renamed, to indicate this isn't returning a "page"?
We have talked about the name "netmem" before.
Yeah, this is more-or-less what I was getting at. Keep in mind this is
likely the most common case since most frames passed and forth aren't
ever usually much larger than 1500B.
Good to get confirmed this is "more-or-less" your suggestion/direction.
if (!nskb) {
page_pool_put_full_page(rq->page_pool, page, true);
goto drop;
@@ -767,7 +770,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
skb_copy_header(nskb, skb);
skb_mark_for_recycle(nskb);
- size = min_t(u32, skb->len, max_head_size);
if (skb_copy_bits(skb, 0, nskb->data, size)) {
consume_skb(nskb);
goto drop;
@@ -782,14 +784,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
len = skb->len - off;
for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
- page = page_pool_dev_alloc_pages(rq->page_pool);
+ size = min_t(u32, len, PAGE_SIZE);
+ truesize = size;
+
+ page = page_pool_dev_alloc(rq->page_pool, &page_offset,
+ &truesize);
if (!page) {
consume_skb(nskb);
goto drop;
}
- size = min_t(u32, len, PAGE_SIZE);
- skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
+ skb_add_rx_frag(nskb, i, page, page_offset, size, truesize);
Guess, this shows the opposite; that the "page" _is_ used by the
existing API.
This is a sort-of. One thing that has come up as of late is that all
this stuff is being moved over to folios anyway and getting away from
pages. In addition I am not sure how often we are having to take this
path as I am not sure how many non-Tx frames end up having to have
fragments added to them. For something like veth it might be more
common though since Tx becomes Rx in this case.
I'm thinking, that is it very unlikely that XDP have modified the
fragments. So, why are we allocating and copying the fragments?
Wouldn't it be possible for this veth code to bump the refcnt on these
fragments? (maybe I missed some detail).
One thought I had on this is that we could look at adding a new
function that abstracts this away and makes use of netmem instead.
Then the whole page/folio thing would be that much further removed.
I like this "thought" of yours :-)
One other question I have now that I look at this code as well. Why is
it using page_pool and not just a frag cache allocator, or pages
themselves? It doesn't seem like it has a DMA mapping to deal with
since this is essentially copy-break code. Seems problematic that
there is no DMA involved here at all. This could be more easily
handled with just a single page_frag_cache style allocator.
Yes, precisely.
I distinctly remember what I tried to poke you and Eric on this approach
earlier, but I cannot find a link to that email.
I would really appreciate, if you Alex, could give the approach in
veth_convert_skb_to_xdp_buff() some review, as I believe that is a huge
potential for improvements that will lead to large performance
improvements. (I'm sure Maryam will be eager to help re-test performance
for her use-cases).
--Jesper