On Fri, Jun 16, 2023 at 11:41 AM Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> wrote: > > > > On 16/06/2023 19.34, Alexander Duyck wrote: > > On Fri, Jun 16, 2023 at 9:31 AM Jesper Dangaard Brouer > > <jbrouer@xxxxxxxxxx> wrote: > >> > >> > >> > >> On 16/06/2023 13.57, Yunsheng Lin wrote: > >>> On 2023/6/16 0:19, Jesper Dangaard Brouer wrote: [...] > > > >>> if (!nskb) { > >>> page_pool_put_full_page(rq->page_pool, page, true); > >>> goto drop; > >>> @@ -767,7 +770,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, > >>> skb_copy_header(nskb, skb); > >>> skb_mark_for_recycle(nskb); > >>> > >>> - size = min_t(u32, skb->len, max_head_size); > >>> if (skb_copy_bits(skb, 0, nskb->data, size)) { > >>> consume_skb(nskb); > >>> goto drop; > >>> @@ -782,14 +784,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, > >>> len = skb->len - off; > >>> > >>> for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { > >>> - page = page_pool_dev_alloc_pages(rq->page_pool); > >>> + size = min_t(u32, len, PAGE_SIZE); > >>> + truesize = size; > >>> + > >>> + page = page_pool_dev_alloc(rq->page_pool, &page_offset, > >>> + &truesize); > >>> if (!page) { > >>> consume_skb(nskb); > >>> goto drop; > >>> } > >>> > >>> - size = min_t(u32, len, PAGE_SIZE); > >>> - skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE); > >>> + skb_add_rx_frag(nskb, i, page, page_offset, size, truesize); > >> > >> Guess, this shows the opposite; that the "page" _is_ used by the > >> existing API. > > > > This is a sort-of. One thing that has come up as of late is that all > > this stuff is being moved over to folios anyway and getting away from > > pages. In addition I am not sure how often we are having to take this > > path as I am not sure how many non-Tx frames end up having to have > > fragments added to them. For something like veth it might be more > > common though since Tx becomes Rx in this case. > > I'm thinking, that is it very unlikely that XDP have modified the > fragments. So, why are we allocating and copying the fragments? > Wouldn't it be possible for this veth code to bump the refcnt on these > fragments? (maybe I missed some detail). >From what I can tell this is an exception case with multiple caveats for shared, locked, or nonlinear frames. As such I suspect there may be some deprecated cases in there too since XDP multi-buf support has been around for a while so the code shouldn't need to reallocate to linearize a nonlinear frame. > > > > One other question I have now that I look at this code as well. Why is > > it using page_pool and not just a frag cache allocator, or pages > > themselves? It doesn't seem like it has a DMA mapping to deal with > > since this is essentially copy-break code. Seems problematic that > > there is no DMA involved here at all. This could be more easily > > handled with just a single page_frag_cache style allocator. > > > > Yes, precisely. > I distinctly remember what I tried to poke you and Eric on this approach > earlier, but I cannot find a link to that email. > > I would really appreciate, if you Alex, could give the approach in > veth_convert_skb_to_xdp_buff() some review, as I believe that is a huge > potential for improvements that will lead to large performance > improvements. (I'm sure Maryam will be eager to help re-test performance > for her use-cases). Well just looking at it the quick and dirty answer would be to look at making use of something like page_frag_cache. I won't go into details since it isn't too different from the frag allocator, but it is much simpler since it is just doing reference count hacks instead of having to do the extra overhead to keep the DMA mapping in place. The veth would then just be sitting on at most an order 3 page while it is waiting to fully consume it rather than waiting on a full pool of pages. Alternatively it could do something similar to page_frag_alloc_align itself and just bypass doing a custom allocator. If it went that route it could do something almost like a ring buffer and greatly improve the throughput since it would be able to allocate a higher order page and just copy the entire skb in so the entire thing would be linear rather than having to allocate a bunch of single pages.