From: Amit Cohen <amcohen@xxxxxxxxxx> Date: Sun, 17 Nov 2024 12:42:11 +0000 > > >> -----Original Message----- >> From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> >> Sent: Friday, 15 November 2024 16:35 >> To: Ido Schimmel <idosch@xxxxxxxxxx> >> Cc: David S. Miller <davem@xxxxxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni >> <pabeni@xxxxxxxxxx>; Toke Høiland-Jørgensen <toke@xxxxxxxxxx>; Alexei Starovoitov <ast@xxxxxxxxxx>; Daniel Borkmann >> <daniel@xxxxxxxxxxxxx>; John Fastabend <john.fastabend@xxxxxxxxx>; Andrii Nakryiko <andrii@xxxxxxxxxx>; Maciej Fijalkowski >> <maciej.fijalkowski@xxxxxxxxx>; Stanislav Fomichev <sdf@xxxxxxxxxxx>; Magnus Karlsson <magnus.karlsson@xxxxxxxxx>; >> nex.sw.ncis.osdt.itp.upstreaming@xxxxxxxxx; bpf@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx >> Subject: Re: [PATCH net-next v5 12/19] xdp: add generic xdp_build_skb_from_buff() >> >> From: Ido Schimmel <idosch@xxxxxxxxxx> >> Date: Thu, 14 Nov 2024 17:16:44 +0200 >> >>> On Thu, Nov 14, 2024 at 05:06:06PM +0200, Ido Schimmel wrote: >>>> Looks good (no objections to the patch), but I have a question. See >>>> below. >>>> >>>> On Wed, Nov 13, 2024 at 04:24:35PM +0100, Alexander Lobakin wrote: >>>>> The code which builds an skb from an &xdp_buff keeps multiplying itself >>>>> around the drivers with almost no changes. Let's try to stop that by >>>>> adding a generic function. >>>>> Unlike __xdp_build_skb_from_frame(), always allocate an skbuff head >>>>> using napi_build_skb() and make use of the available xdp_rxq pointer to >>>>> assign the Rx queue index. In case of PP-backed buffer, mark the skb to >>>>> be recycled, as every PP user's been switched to recycle skbs. >>>>> >>>>> Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> >>>>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> >>>> >>>> Reviewed-by: Ido Schimmel <idosch@xxxxxxxxxx> >>>> >>>>> --- >>>>> include/net/xdp.h | 1 + >>>>> net/core/xdp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++ >>>>> 2 files changed, 56 insertions(+) >>>>> >>>>> diff --git a/include/net/xdp.h b/include/net/xdp.h >>>>> index 4c19042adf80..b0a25b7060ff 100644 >>>>> --- a/include/net/xdp.h >>>>> +++ b/include/net/xdp.h >>>>> @@ -330,6 +330,7 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags, >>>>> void xdp_warn(const char *msg, const char *func, const int line); >>>>> #define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__) >>>>> >>>>> +struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp); >>>>> struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp); >>>>> struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, >>>>> struct sk_buff *skb, >>>>> diff --git a/net/core/xdp.c b/net/core/xdp.c >>>>> index b1b426a9b146..3a9a3c14b080 100644 >>>>> --- a/net/core/xdp.c >>>>> +++ b/net/core/xdp.c >>>>> @@ -624,6 +624,61 @@ int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp) >>>>> } >>>>> EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk); >>>>> >>>>> +/** >>>>> + * xdp_build_skb_from_buff - create an skb from an &xdp_buff >>>>> + * @xdp: &xdp_buff to convert to an skb >>>>> + * >>>>> + * Perform common operations to create a new skb to pass up the stack from >>>>> + * an &xdp_buff: allocate an skb head from the NAPI percpu cache, initialize >>>>> + * skb data pointers and offsets, set the recycle bit if the buff is PP-backed, >>>>> + * Rx queue index, protocol and update frags info. >>>>> + * >>>>> + * Return: new &sk_buff on success, %NULL on error. >>>>> + */ >>>>> +struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp) >>>>> +{ >>>>> + const struct xdp_rxq_info *rxq = xdp->rxq; >>>>> + const struct skb_shared_info *sinfo; >>>>> + struct sk_buff *skb; >>>>> + u32 nr_frags = 0; >>>>> + int metalen; >>>>> + >>>>> + if (unlikely(xdp_buff_has_frags(xdp))) { >>>>> + sinfo = xdp_get_shared_info_from_buff(xdp); >>>>> + nr_frags = sinfo->nr_frags; >>>>> + } >>>>> + >>>>> + skb = napi_build_skb(xdp->data_hard_start, xdp->frame_sz); >>>>> + if (unlikely(!skb)) >>>>> + return NULL; >>>>> + >>>>> + skb_reserve(skb, xdp->data - xdp->data_hard_start); >>>>> + __skb_put(skb, xdp->data_end - xdp->data); >>>>> + >>>>> + metalen = xdp->data - xdp->data_meta; >>>>> + if (metalen > 0) >>>>> + skb_metadata_set(skb, metalen); >>>>> + >>>>> + if (is_page_pool_compiled_in() && rxq->mem.type == MEM_TYPE_PAGE_POOL) >>>>> + skb_mark_for_recycle(skb); >>>>> + >>>>> + skb_record_rx_queue(skb, rxq->queue_index); >>>>> + >>>>> + if (unlikely(nr_frags)) { >>>>> + u32 tsize; >>>>> + >>>>> + tsize = sinfo->xdp_frags_truesize ? : nr_frags * xdp->frame_sz; >>>>> + xdp_update_skb_shared_info(skb, nr_frags, >>>>> + sinfo->xdp_frags_size, tsize, >>>>> + xdp_buff_is_frag_pfmemalloc(xdp)); >>>>> + } >>>>> + >>>>> + skb->protocol = eth_type_trans(skb, rxq->dev); >>>> >>>> The device we are working with has more ports (net devices) than Rx >>>> queues, so each queue can receive packets from different net devices. >>>> Currently, each Rx queue has its own NAPI instance and its own page >>>> pool. All the Rx NAPI instances are initialized using the same dummy net >>>> device which is allocated using alloc_netdev_dummy(). >>>> >>>> What are our options with regards to the XDP Rx queue info structure? As >>>> evident by this patch, it does not seem valid to register one such >>>> structure per Rx queue and pass the dummy net device. Would it be valid >>>> to register one such structure per port (net device) and pass zero for >>>> the queue index and NAPI ID? >>> >>> Actually, this does not seem to be valid either as we need to associate >>> an XDP Rx queue info with the correct page pool :/ >> >> Right. >> But I'd say, this assoc slowly becomes redundant. For example, idpf has >> up to 4 page_pools per queue and I only pass 1 of them to rxq_info as >> there are no other options. Regardless, its frames get processed >> correctly thanks to that we have struct page::pp pointer + patch 9 from >> this series which teaches put_page_bulk() to handle mixed bulks. >> >> Regarding your usecase -- after calling this function, you are free to >> overwrite any skb fields as this helper doesn't pass it up the stack. >> For example, in ice driver we have port reps and sometimes we need to >> pass a different net_device, not the one saved in rxq_info. So when >> switching to this function, we'll do eth_type_trans() once again (it's >> either way under unlikely() in our code as it's swichdev slowpath). >> Same for the queue number in rxq_info. > > With this series, maintaining 'struct xdp_mem_allocator' in hash-table looks unnecessary. > If so, xdp_reg_mem_model() does not need 'allocator' when mem_type is Page-Pool. > > Is there a reason for not removing 'mem_id_ht'? With this patch, the nodes are no longer used. Let me review this once again since I need to rebase it anyway. Maybe we really could drop more code. Thanks, Olek