On Fri, Jun 16, 2023 at 9:31 AM Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> wrote: > > > > On 16/06/2023 13.57, Yunsheng Lin wrote: > > On 2023/6/16 0:19, Jesper Dangaard Brouer wrote: > > > > ... > > > >> You have mentioned veth as the use-case. I know I acked adding page_pool > >> use-case to veth, for when we need to convert an SKB into an > >> xdp_buff/xdp-frame, but maybe it was the wrong hammer(?). > >> In this case in veth, the size is known at the page allocation time. > >> Thus, using the page_pool API is wasting memory. We did this for > >> performance reasons, but we are not using PP for what is was intended > >> for. We mostly use page_pool, because it an existing recycle return > >> path, and we were too lazy to add another alloc-type (see enum > >> xdp_mem_type). > >> > >> Maybe you/we can extend veth to use this dynamic size API, to show us > >> that this is API is a better approach. I will signup for benchmarking > >> this (and coordinating with CC Maryam as she came with use-case we > >> improved on). > > > > Thanks, let's find out if page pool is the right hammer for the > > veth XDP case. > > > > Below is the change for veth using the new api in this patch. > > Only compile test as I am not familiar enough with veth XDP and > > testing environment for it. > > Please try it if it is helpful. > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > index 614f3e3efab0..8850394f1d29 100644 > > --- a/drivers/net/veth.c > > +++ b/drivers/net/veth.c > > @@ -736,7 +736,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, > > if (skb_shared(skb) || skb_head_is_locked(skb) || > > skb_shinfo(skb)->nr_frags || > > skb_headroom(skb) < XDP_PACKET_HEADROOM) { > > - u32 size, len, max_head_size, off; > > + u32 size, len, max_head_size, off, truesize, page_offset; > > struct sk_buff *nskb; > > struct page *page; > > int i, head_off; > > @@ -752,12 +752,15 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, > > if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size) > > goto drop; > > > > + size = min_t(u32, skb->len, max_head_size); > > + truesize = size; > > + > > /* Allocate skb head */ > > - page = page_pool_dev_alloc_pages(rq->page_pool); > > + page = page_pool_dev_alloc(rq->page_pool, &page_offset, &truesize); > > Maybe rename API to: > > addr = netmem_alloc(rq->page_pool, &truesize); > > > if (!page) > > goto drop; > > > > - nskb = napi_build_skb(page_address(page), PAGE_SIZE); > > + nskb = napi_build_skb(page_address(page) + page_offset, truesize); > > IMHO this illustrates that API is strange/funky. > (I think this is what Alex Duyck is also pointing out). > > This is the memory (virtual) address "pointer": > addr = page_address(page) + page_offset > > This is what napi_build_skb() takes as input. (I looked at other users > of napi_build_skb() whom all give a mem ptr "va" as arg.) > So, why does your new API provide the "page" and not just the address? > > As proposed above: > addr = netmem_alloc(rq->page_pool, &truesize); > > Maybe the API should be renamed, to indicate this isn't returning a "page"? > We have talked about the name "netmem" before. Yeah, this is more-or-less what I was getting at. Keep in mind this is likely the most common case since most frames passed and forth aren't ever usually much larger than 1500B. > > if (!nskb) { > > page_pool_put_full_page(rq->page_pool, page, true); > > goto drop; > > @@ -767,7 +770,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, > > skb_copy_header(nskb, skb); > > skb_mark_for_recycle(nskb); > > > > - size = min_t(u32, skb->len, max_head_size); > > if (skb_copy_bits(skb, 0, nskb->data, size)) { > > consume_skb(nskb); > > goto drop; > > @@ -782,14 +784,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, > > len = skb->len - off; > > > > for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { > > - page = page_pool_dev_alloc_pages(rq->page_pool); > > + size = min_t(u32, len, PAGE_SIZE); > > + truesize = size; > > + > > + page = page_pool_dev_alloc(rq->page_pool, &page_offset, > > + &truesize); > > if (!page) { > > consume_skb(nskb); > > goto drop; > > } > > > > - size = min_t(u32, len, PAGE_SIZE); > > - skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE); > > + skb_add_rx_frag(nskb, i, page, page_offset, size, truesize); > > Guess, this shows the opposite; that the "page" _is_ used by the > existing API. This is a sort-of. One thing that has come up as of late is that all this stuff is being moved over to folios anyway and getting away from pages. In addition I am not sure how often we are having to take this path as I am not sure how many non-Tx frames end up having to have fragments added to them. For something like veth it might be more common though since Tx becomes Rx in this case. One thought I had on this is that we could look at adding a new function that abstracts this away and makes use of netmem instead. Then the whole page/folio thing would be that much further removed. One other question I have now that I look at this code as well. Why is it using page_pool and not just a frag cache allocator, or pages themselves? It doesn't seem like it has a DMA mapping to deal with since this is essentially copy-break code. Seems problematic that there is no DMA involved here at all. This could be more easily handled with just a single page_frag_cache style allocator.