Re: [PATCH net-next 3/6] mm/page_alloc: use initial zero offset for page_frag_alloc_align()

Yunsheng Lin <linyunsheng@xxxxxxxxxx> · Mon, 8 Jan 2024 16:59:49 +0800

On 2024/1/5 23:42, Alexander H Duyck wrote:
> On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
>> The next patch is above to use page_frag_alloc_align() to
>> replace vhost_net_page_frag_refill(), the main difference
>> between those two frag page implementations is whether we
>> use a initial zero offset or not.
>>
>> It seems more nature to use a initial zero offset, as it
>> may enable more correct cache prefetching and skb frag
>> coalescing in the networking, so change it to use initial
>> zero offset.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
>> CC: Alexander Duyck <alexander.duyck@xxxxxxxxx>
> 
> There are several advantages to running the offset as a countdown
> rather than count-up value.
> 
> 1. Specifically for the size of the chunks we are allocating doing it
> from the bottom up doesn't add any value as we are jumping in large
> enough amounts and are being used for DMA so being sequential doesn't
> add any value.

What is the expected size of the above chunks in your mind? I suppose
that is like NAPI_HAS_SMALL_PAGE_FRAG to avoid excessive truesize
underestimation?

It seems there is no limit for min size of allocation for
page_frag_alloc_align() now, and as the page_frag API seems to be only
used in networking, should we enforce the min size of allocation at API
level?

> 
> 2. By starting at the end and working toward zero we can use built in
> functionality of the CPU to only have to check and see if our result
> would be signed rather than having to load two registers with the
> values and then compare them which saves us a few cycles. In addition
> it saves us from having to read both the size and the offset for every
> page.

I suppose the above is ok if we only use the page_frag_alloc*() API to
allocate memory for skb->data, not for the frag in skb_shinfo(), as by
starting at the end and working toward zero, it means we can not do skb
coalescing.

As page_frag_alloc*() is returning va now, I am assuming most of users
is using the API for skb->data, I guess it is ok to drop this patch for
now.

If we allow page_frag_alloc*() to return struct page, we might need this
patch to enable coalescing.

> 
> Again this is another code cleanup at the cost of performance. I
> realize many of the items you are removing would be considered micro-
> optimizations but when we are dealing with millions of packets per
> second those optimizations add up.
> .
>