On 2023/6/17 1:34, Alexander Duyck wrote: ... >>> >>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>> index 614f3e3efab0..8850394f1d29 100644 >>> --- a/drivers/net/veth.c >>> +++ b/drivers/net/veth.c >>> @@ -736,7 +736,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, >>> if (skb_shared(skb) || skb_head_is_locked(skb) || >>> skb_shinfo(skb)->nr_frags || >>> skb_headroom(skb) < XDP_PACKET_HEADROOM) { >>> - u32 size, len, max_head_size, off; >>> + u32 size, len, max_head_size, off, truesize, page_offset; >>> struct sk_buff *nskb; >>> struct page *page; >>> int i, head_off; >>> @@ -752,12 +752,15 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, >>> if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size) >>> goto drop; >>> >>> + size = min_t(u32, skb->len, max_head_size); >>> + truesize = size; >>> + >>> /* Allocate skb head */ >>> - page = page_pool_dev_alloc_pages(rq->page_pool); >>> + page = page_pool_dev_alloc(rq->page_pool, &page_offset, &truesize); >> >> Maybe rename API to: >> >> addr = netmem_alloc(rq->page_pool, &truesize); Unless we create a subsystem called netmem, I am not sure about the 'netmem', it seems more confusing to use it here. >> >>> if (!page) >>> goto drop; >>> >>> - nskb = napi_build_skb(page_address(page), PAGE_SIZE); >>> + nskb = napi_build_skb(page_address(page) + page_offset, truesize); >> >> IMHO this illustrates that API is strange/funky. >> (I think this is what Alex Duyck is also pointing out). >> >> This is the memory (virtual) address "pointer": >> addr = page_address(page) + page_offset >> >> This is what napi_build_skb() takes as input. (I looked at other users >> of napi_build_skb() whom all give a mem ptr "va" as arg.) >> So, why does your new API provide the "page" and not just the address? >> >> As proposed above: >> addr = netmem_alloc(rq->page_pool, &truesize); >> >> Maybe the API should be renamed, to indicate this isn't returning a "page"? >> We have talked about the name "netmem" before. > > Yeah, this is more-or-less what I was getting at. Keep in mind this is > likely the most common case since most frames passed and forth aren't > ever usually much larger than 1500B. I do feel the pain here, there is why I use a per cpu 'struct page_pool_frag' to report the result back to user so that we can report both 'va' and 'page' to the user in the RFC of this patchset. IHMO, compared to the above point, it is more importance that we have a unified implementation for both of them instead of page frag based on the page allocator. Currently there are three implementations for page frag: 1. mm/page_alloc.c: net stack seems to be using it in the rx part with 'struct page_frag_cache' and the main API being page_frag_alloc_align(). 2. net/core/sock.c: net stack seems to be using it in the tx part with 'struct page_frag' and the main API being skb_page_frag_refill(). 3. drivers/vhost/net.c: vhost seems to be using it to build xdp frame, and it's implementation seems to be a mix of the above two. Acctually I have a patchset to remove the third one waiting to send out after this one. And I wonder if the first and second one can be unified as one, as it seems the only user facing difference is one returning va, and the other returning page. other difference seems to be implementation specific, for example, one is doing offset incrementing, and the other doing offset decrementing.