On Tue, Dec 12, 2023 at 6:37 PM Liang Chen <liangchen.linux@xxxxxxxxx> wrote: > > On Wed, Dec 13, 2023 at 9:49 AM Mina Almasry <almasrymina@xxxxxxxxxx> wrote: > > > > On Mon, Dec 11, 2023 at 8:47 PM Liang Chen <liangchen.linux@xxxxxxxxx> wrote: > > > > > > In order to address the issues encountered with commit 1effe8ca4e34 > > > ("skbuff: fix coalescing for page_pool fragment recycling"), the > > > combination of the following condition was excluded from skb coalescing: > > > > > > from->pp_recycle = 1 > > > from->cloned = 1 > > > to->pp_recycle = 1 > > > > > > However, with page pool environments, the aforementioned combination can > > > be quite common(ex. NetworkMananger may lead to the additional > > > packet_type being registered, thus the cloning). In scenarios with a > > > higher number of small packets, it can significantly affect the success > > > rate of coalescing. For example, considering packets of 256 bytes size, > > > our comparison of coalescing success rate is as follows: > > > > > > Without page pool: 70% > > > With page pool: 13% > > > > > > Consequently, this has an impact on performance: > > > > > > Without page pool: 2.57 Gbits/sec > > > With page pool: 2.26 Gbits/sec > > > > > > Therefore, it seems worthwhile to optimize this scenario and enable > > > coalescing of this particular combination. To achieve this, we need to > > > ensure the correct increment of the "from" SKB page's page pool > > > reference count (pp_ref_count). > > > > > > Following this optimization, the success rate of coalescing measured in > > > our environment has improved as follows: > > > > > > With page pool: 60% > > > > > > This success rate is approaching the rate achieved without using page > > > pool, and the performance has also been improved: > > > > > > With page pool: 2.52 Gbits/sec > > > > > > Below is the performance comparison for small packets before and after > > > this optimization. We observe no impact to packets larger than 4K. > > > > > > packet size before after improved > > > (bytes) (Gbits/sec) (Gbits/sec) > > > 128 1.19 1.27 7.13% > > > 256 2.26 2.52 11.75% > > > 512 4.13 4.81 16.50% > > > 1024 6.17 6.73 9.05% > > > 2048 14.54 15.47 6.45% > > > 4096 25.44 27.87 9.52% > > > > > > Signed-off-by: Liang Chen <liangchen.linux@xxxxxxxxx> > > > Reviewed-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> > > > Suggested-by: Jason Wang <jasowang@xxxxxxxxxx> > > > --- > > > include/net/page_pool/helpers.h | 5 ++++ > > > net/core/skbuff.c | 43 ++++++++++++++++++++++++--------- > > > 2 files changed, 36 insertions(+), 12 deletions(-) > > > > > > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h > > > index d0c5e7e6857a..0dc8fab43bef 100644 > > > --- a/include/net/page_pool/helpers.h > > > +++ b/include/net/page_pool/helpers.h > > > @@ -281,6 +281,11 @@ static inline long page_pool_unref_page(struct page *page, long nr) > > > return ret; > > > } > > > > > > +static inline void page_pool_ref_page(struct page *page) > > > +{ > > > + atomic_long_inc(&page->pp_ref_count); > > > +} > > > + > > > static inline bool page_pool_is_last_ref(struct page *page) > > > { > > > /* If page_pool_unref_page() returns 0, we were the last user */ > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index 7e26b56cda38..783a04733109 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -947,6 +947,26 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe) > > > return napi_pp_put_page(virt_to_page(data), napi_safe); > > > } > > > > > > +/** > > > + * skb_pp_frag_ref() - Increase fragment reference count of a page > > > + * @page: page of the fragment on which to increase a reference > > > + * > > > + * Increase the fragment reference count (pp_ref_count) of a page. This is > > > + * intended to gain a fragment reference only for page pool aware skbs, > > > + * i.e. when skb->pp_recycle is true, and not for fragments in a > > > + * non-pp-recycling skb. It has a fallback to increase a reference on a > > > + * normal page, as page pool aware skbs may also have normal page fragments. > > > + */ > > > +static void skb_pp_frag_ref(struct page *page) > > > +{ > > > + struct page *head_page = compound_head(page); > > > + > > > > Feel free to not delay this patch series further based on this > > comment/question, but... > > > > I'm a bit confused about the need for compound_head() here, but > > skb_frag_ref() doesn't first obtain the compound_head(). Is there a > > page_pool specific reason why skb_frag_ref() can get_page() directly > > but this helper needs to grab the compound_head() first? > > > > get_page includes the call to compound_head, so skb_frag_ref > indirectly calls compound_head as well. > > > > + if (likely(is_pp_page(head_page))) > > > + page_pool_ref_page(head_page); > > > + else > > > + page_ref_inc(head_page); > > > > Any reason why not get_page() here? > > > > head_page is a head page because of the compound_head call above. This > was actually a comment received from a previous iteration:) > I see, thanks. Reviewed-by: Mina Almasry <almasrymina@xxxxxxxxxx> Noob question: do we actually support someone passing a compound_page to skb_frag_fill_page_desc()? Anyone know of any driver that does this? I kinda like the direction this patch was going instead: https://patchwork.kernel.org/project/netdevbpf/patch/20231113130041.58124-5-linyunsheng@xxxxxxxxxx/ Where we explicitly exclude compound pages from skbs... This is for convenience for devmem TCP, where I don't support compound pages, but that is more my problem than yours. This patch is fine. > > > +} > > > + > > > static void skb_kfree_head(void *head, unsigned int end_offset) > > > { > > > if (end_offset == SKB_SMALL_HEAD_HEADROOM) > > > @@ -5769,17 +5789,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > > > return false; > > > > > > /* In general, avoid mixing page_pool and non-page_pool allocated > > > - * pages within the same SKB. Additionally avoid dealing with clones > > > - * with page_pool pages, in case the SKB is using page_pool fragment > > > - * references (page_pool_alloc_frag()). Since we only take full page > > > - * references for cloned SKBs at the moment that would result in > > > - * inconsistent reference counts. > > > - * In theory we could take full references if @from is cloned and > > > - * !@to->pp_recycle but its tricky (due to potential race with > > > - * the clone disappearing) and rare, so not worth dealing with. > > > + * pages within the same SKB. In theory we could take full > > > + * references if @from is cloned and !@to->pp_recycle but its > > > + * tricky (due to potential race with the clone disappearing) and > > > + * rare, so not worth dealing with. > > > */ > > > - if (to->pp_recycle != from->pp_recycle || > > > - (from->pp_recycle && skb_cloned(from))) > > > + if (to->pp_recycle != from->pp_recycle) > > > return false; > > > > > > if (len <= skb_tailroom(to)) { > > > @@ -5836,8 +5851,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > > > /* if the skb is not cloned this does nothing > > > * since we set nr_frags to 0. > > > */ > > > - for (i = 0; i < from_shinfo->nr_frags; i++) > > > - __skb_frag_ref(&from_shinfo->frags[i]); > > > + if (from->pp_recycle) > > > + for (i = 0; i < from_shinfo->nr_frags; i++) > > > + skb_pp_frag_ref(skb_frag_page(&from_shinfo->frags[i])); > > > + else > > > + for (i = 0; i < from_shinfo->nr_frags; i++) > > > + __skb_frag_ref(&from_shinfo->frags[i]); > > > > > > to->truesize += delta; > > > to->len += len; > > > -- > > > 2.31.1 > > > > > > > > > -- > > Thanks, > > Mina -- Thanks, Mina