Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer

Yunsheng Lin <linyunsheng@xxxxxxxxxx> · Thu, 18 May 2023 09:16:12 +0800

On 2023/5/17 22:17, Lorenzo Bianconi wrote:
>> Maybe using the new frag interface introduced in [1] bring
>> back the performance for the MTU 8000B case.
>>
>> 1. https://patchwork.kernel.org/project/netdevbpf/cover/20230516124801.2465-1-linyunsheng@xxxxxxxxxx/
>>
>>
>> I drafted a patch for veth to use the new frag interface, maybe that
>> will show how veth can make use of it. Would you give it a try to see
>> if there is any performance improvment for MTU 8000B case? Thanks.
>>
>> --- a/drivers/net/veth.c
>> +++ b/drivers/net/veth.c
>> @@ -737,8 +737,8 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>>             skb_shinfo(skb)->nr_frags ||
>>             skb_headroom(skb) < XDP_PACKET_HEADROOM) {
>>                 u32 size, len, max_head_size, off;
>> +               struct page_pool_frag *pp_frag;
>>                 struct sk_buff *nskb;
>> -               struct page *page;
>>                 int i, head_off;
>>
>>                 /* We need a private copy of the skb and data buffers since
>> @@ -752,14 +752,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>>                 if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
>>                         goto drop;
>>
>> +               size = min_t(u32, skb->len, max_head_size);
>> +               size += VETH_XDP_HEADROOM;
>> +               size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +
>>                 /* Allocate skb head */
>> -               page = page_pool_dev_alloc_pages(rq->page_pool);
>> -               if (!page)
>> +               pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
>> +               if (!pp_frag)
>>                         goto drop;
>>
>> -               nskb = napi_build_skb(page_address(page), PAGE_SIZE);
>> +               nskb = napi_build_skb(page_address(pp_frag->page) + pp_frag->offset,
>> +                                     pp_frag->truesize);
>>                 if (!nskb) {
>> -                       page_pool_put_full_page(rq->page_pool, page, true);
>> +                       page_pool_put_full_page(rq->page_pool, pp_frag->page,
>> +                                               true);
>>                         goto drop;
>>                 }
>>
>> @@ -782,16 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>>                 len = skb->len - off;
>>
>>                 for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
>> -                       page = page_pool_dev_alloc_pages(rq->page_pool);
>> -                       if (!page) {
>> +                       size = min_t(u32, len, PAGE_SIZE);
>> +
>> +                       pp_frag = page_pool_dev_alloc_frag(rq->page_pool, size);
>> +                       if (!pp_frag) {
>>                                 consume_skb(nskb);
>>                                 goto drop;
>>                         }
>>
>> -                       size = min_t(u32, len, PAGE_SIZE);
>> -                       skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
>> -                       if (skb_copy_bits(skb, off, page_address(page),
>> -                                         size)) {
>> +                       skb_add_rx_frag(nskb, i, pp_frag->page, pp_frag->offset,
>> +                                       size, pp_frag->truesize);
>> +                       if (skb_copy_bits(skb, off, page_address(pp_frag->page) +
>> +                                         pp_frag->offset, size)) {
>>                                 consume_skb(nskb);
>>                                 goto drop;
>>                         }
>> @@ -1047,6 +1055,8 @@ static int veth_create_page_pool(struct veth_rq *rq)
>>                 return err;
>>         }
> 
> IIUC the code here we are using a variable length for linear part (at most one page)
> while we will always use a full page (exept for the last fragment) for the paged

More correctly, it does not care if the data is in linear part or in paged area.
We copy the data to new skb using least possible fragment and most memory saving
depending on head/tail room size and the page size/order, as skb_copy_bits() hides
the date layout differenence for it's caller.

> area, correct? I have not tested it yet but I do not think we will get a significant
> improvement since if we set MTU to 8000B in my tests we get mostly the same throughput
> (14.5 Gbps vs 14.7 Gbps) if we use page_pool fragment or page_pool full page.
> Am I missing something?

I don't expect significant improvement too, but I do expect a 'nice improvement' for
performance and memory saving depending on how you view 'nice improvement':)

> What we are discussing with Jesper is try to allocate a order 3 page from the pool and
> rely page_pool fragment, similar to page_frag_cache is doing. I will look into it if
> there are no strong 'red flags'.

Thanks.
Yes, if we do not really care about memory usage, using order 3 page should give more
performance improvement.
As my understanding, improvement mentioned above is also applied to order 3 page.

> 
> Regards,
> Lorenzo
> 
>>
>> +       page_pool_set_max_frag_size(rq->page_pool, PAGE_SIZE / 2);
>> +
>>         return 0;
>>  }
>>