> -----Original Message----- > From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> > Sent: Friday, 15 November 2024 16:35 > To: Ido Schimmel <idosch@xxxxxxxxxx> > Cc: David S. Miller <davem@xxxxxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni > <pabeni@xxxxxxxxxx>; Toke Høiland-Jørgensen <toke@xxxxxxxxxx>; Alexei Starovoitov <ast@xxxxxxxxxx>; Daniel Borkmann > <daniel@xxxxxxxxxxxxx>; John Fastabend <john.fastabend@xxxxxxxxx>; Andrii Nakryiko <andrii@xxxxxxxxxx>; Maciej Fijalkowski > <maciej.fijalkowski@xxxxxxxxx>; Stanislav Fomichev <sdf@xxxxxxxxxxx>; Magnus Karlsson <magnus.karlsson@xxxxxxxxx>; > nex.sw.ncis.osdt.itp.upstreaming@xxxxxxxxx; bpf@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH net-next v5 12/19] xdp: add generic xdp_build_skb_from_buff() > > From: Ido Schimmel <idosch@xxxxxxxxxx> > Date: Thu, 14 Nov 2024 17:16:44 +0200 > > > On Thu, Nov 14, 2024 at 05:06:06PM +0200, Ido Schimmel wrote: > >> Looks good (no objections to the patch), but I have a question. See > >> below. > >> > >> On Wed, Nov 13, 2024 at 04:24:35PM +0100, Alexander Lobakin wrote: > >>> The code which builds an skb from an &xdp_buff keeps multiplying itself > >>> around the drivers with almost no changes. Let's try to stop that by > >>> adding a generic function. > >>> Unlike __xdp_build_skb_from_frame(), always allocate an skbuff head > >>> using napi_build_skb() and make use of the available xdp_rxq pointer to > >>> assign the Rx queue index. In case of PP-backed buffer, mark the skb to > >>> be recycled, as every PP user's been switched to recycle skbs. > >>> > >>> Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > >>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> > >> > >> Reviewed-by: Ido Schimmel <idosch@xxxxxxxxxx> > >> > >>> --- > >>> include/net/xdp.h | 1 + > >>> net/core/xdp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++ > >>> 2 files changed, 56 insertions(+) > >>> > >>> diff --git a/include/net/xdp.h b/include/net/xdp.h > >>> index 4c19042adf80..b0a25b7060ff 100644 > >>> --- a/include/net/xdp.h > >>> +++ b/include/net/xdp.h > >>> @@ -330,6 +330,7 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags, > >>> void xdp_warn(const char *msg, const char *func, const int line); > >>> #define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__) > >>> > >>> +struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp); > >>> struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp); > >>> struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, > >>> struct sk_buff *skb, > >>> diff --git a/net/core/xdp.c b/net/core/xdp.c > >>> index b1b426a9b146..3a9a3c14b080 100644 > >>> --- a/net/core/xdp.c > >>> +++ b/net/core/xdp.c > >>> @@ -624,6 +624,61 @@ int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp) > >>> } > >>> EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk); > >>> > >>> +/** > >>> + * xdp_build_skb_from_buff - create an skb from an &xdp_buff > >>> + * @xdp: &xdp_buff to convert to an skb > >>> + * > >>> + * Perform common operations to create a new skb to pass up the stack from > >>> + * an &xdp_buff: allocate an skb head from the NAPI percpu cache, initialize > >>> + * skb data pointers and offsets, set the recycle bit if the buff is PP-backed, > >>> + * Rx queue index, protocol and update frags info. > >>> + * > >>> + * Return: new &sk_buff on success, %NULL on error. > >>> + */ > >>> +struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp) > >>> +{ > >>> + const struct xdp_rxq_info *rxq = xdp->rxq; > >>> + const struct skb_shared_info *sinfo; > >>> + struct sk_buff *skb; > >>> + u32 nr_frags = 0; > >>> + int metalen; > >>> + > >>> + if (unlikely(xdp_buff_has_frags(xdp))) { > >>> + sinfo = xdp_get_shared_info_from_buff(xdp); > >>> + nr_frags = sinfo->nr_frags; > >>> + } > >>> + > >>> + skb = napi_build_skb(xdp->data_hard_start, xdp->frame_sz); > >>> + if (unlikely(!skb)) > >>> + return NULL; > >>> + > >>> + skb_reserve(skb, xdp->data - xdp->data_hard_start); > >>> + __skb_put(skb, xdp->data_end - xdp->data); > >>> + > >>> + metalen = xdp->data - xdp->data_meta; > >>> + if (metalen > 0) > >>> + skb_metadata_set(skb, metalen); > >>> + > >>> + if (is_page_pool_compiled_in() && rxq->mem.type == MEM_TYPE_PAGE_POOL) > >>> + skb_mark_for_recycle(skb); > >>> + > >>> + skb_record_rx_queue(skb, rxq->queue_index); > >>> + > >>> + if (unlikely(nr_frags)) { > >>> + u32 tsize; > >>> + > >>> + tsize = sinfo->xdp_frags_truesize ? : nr_frags * xdp->frame_sz; > >>> + xdp_update_skb_shared_info(skb, nr_frags, > >>> + sinfo->xdp_frags_size, tsize, > >>> + xdp_buff_is_frag_pfmemalloc(xdp)); > >>> + } > >>> + > >>> + skb->protocol = eth_type_trans(skb, rxq->dev); > >> > >> The device we are working with has more ports (net devices) than Rx > >> queues, so each queue can receive packets from different net devices. > >> Currently, each Rx queue has its own NAPI instance and its own page > >> pool. All the Rx NAPI instances are initialized using the same dummy net > >> device which is allocated using alloc_netdev_dummy(). > >> > >> What are our options with regards to the XDP Rx queue info structure? As > >> evident by this patch, it does not seem valid to register one such > >> structure per Rx queue and pass the dummy net device. Would it be valid > >> to register one such structure per port (net device) and pass zero for > >> the queue index and NAPI ID? > > > > Actually, this does not seem to be valid either as we need to associate > > an XDP Rx queue info with the correct page pool :/ > > Right. > But I'd say, this assoc slowly becomes redundant. For example, idpf has > up to 4 page_pools per queue and I only pass 1 of them to rxq_info as > there are no other options. Regardless, its frames get processed > correctly thanks to that we have struct page::pp pointer + patch 9 from > this series which teaches put_page_bulk() to handle mixed bulks. > > Regarding your usecase -- after calling this function, you are free to > overwrite any skb fields as this helper doesn't pass it up the stack. > For example, in ice driver we have port reps and sometimes we need to > pass a different net_device, not the one saved in rxq_info. So when > switching to this function, we'll do eth_type_trans() once again (it's > either way under unlikely() in our code as it's swichdev slowpath). > Same for the queue number in rxq_info. With this series, maintaining 'struct xdp_mem_allocator' in hash-table looks unnecessary. If so, xdp_reg_mem_model() does not need 'allocator' when mem_type is Page-Pool. Is there a reason for not removing 'mem_id_ht'? With this patch, the nodes are no longer used. > > > > >> > >> To be clear, I understand it is not a common use case. > >> > >> Thanks > > Thanks, > Olek