Re: [PATCH net-next v9 4/4] skbuff: Optimization of SKB coalescing for page pool

Mina Almasry <almasrymina@xxxxxxxxxx> · Tue, 12 Dec 2023 18:49:40 -0800

On Tue, Dec 12, 2023 at 6:37 PM Liang Chen <liangchen.linux@xxxxxxxxx> wrote:
>
> On Wed, Dec 13, 2023 at 9:49 AM Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
> >
> > On Mon, Dec 11, 2023 at 8:47 PM Liang Chen <liangchen.linux@xxxxxxxxx> wrote:
> > >
> > > In order to address the issues encountered with commit 1effe8ca4e34
> > > ("skbuff: fix coalescing for page_pool fragment recycling"), the
> > > combination of the following condition was excluded from skb coalescing:
> > >
> > > from->pp_recycle = 1
> > > from->cloned = 1
> > > to->pp_recycle = 1
> > >
> > > However, with page pool environments, the aforementioned combination can
> > > be quite common(ex. NetworkMananger may lead to the additional
> > > packet_type being registered, thus the cloning). In scenarios with a
> > > higher number of small packets, it can significantly affect the success
> > > rate of coalescing. For example, considering packets of 256 bytes size,
> > > our comparison of coalescing success rate is as follows:
> > >
> > > Without page pool: 70%
> > > With page pool: 13%
> > >
> > > Consequently, this has an impact on performance:
> > >
> > > Without page pool: 2.57 Gbits/sec
> > > With page pool: 2.26 Gbits/sec
> > >
> > > Therefore, it seems worthwhile to optimize this scenario and enable
> > > coalescing of this particular combination. To achieve this, we need to
> > > ensure the correct increment of the "from" SKB page's page pool
> > > reference count (pp_ref_count).
> > >
> > > Following this optimization, the success rate of coalescing measured in
> > > our environment has improved as follows:
> > >
> > > With page pool: 60%
> > >
> > > This success rate is approaching the rate achieved without using page
> > > pool, and the performance has also been improved:
> > >
> > > With page pool: 2.52 Gbits/sec
> > >
> > > Below is the performance comparison for small packets before and after
> > > this optimization. We observe no impact to packets larger than 4K.
> > >
> > > packet size     before      after       improved
> > > (bytes)         (Gbits/sec) (Gbits/sec)
> > > 128             1.19        1.27        7.13%
> > > 256             2.26        2.52        11.75%
> > > 512             4.13        4.81        16.50%
> > > 1024            6.17        6.73        9.05%
> > > 2048            14.54       15.47       6.45%
> > > 4096            25.44       27.87       9.52%
> > >
> > > Signed-off-by: Liang Chen <liangchen.linux@xxxxxxxxx>
> > > Reviewed-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
> > > Suggested-by: Jason Wang <jasowang@xxxxxxxxxx>
> > > ---
> > >  include/net/page_pool/helpers.h |  5 ++++
> > >  net/core/skbuff.c               | 43 ++++++++++++++++++++++++---------
> > >  2 files changed, 36 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> > > index d0c5e7e6857a..0dc8fab43bef 100644
> > > --- a/include/net/page_pool/helpers.h
> > > +++ b/include/net/page_pool/helpers.h
> > > @@ -281,6 +281,11 @@ static inline long page_pool_unref_page(struct page *page, long nr)
> > >         return ret;
> > >  }
> > >
> > > +static inline void page_pool_ref_page(struct page *page)
> > > +{
> > > +       atomic_long_inc(&page->pp_ref_count);
> > > +}
> > > +
> > >  static inline bool page_pool_is_last_ref(struct page *page)
> > >  {
> > >         /* If page_pool_unref_page() returns 0, we were the last user */
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 7e26b56cda38..783a04733109 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -947,6 +947,26 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe)
> > >         return napi_pp_put_page(virt_to_page(data), napi_safe);
> > >  }
> > >
> > > +/**
> > > + * skb_pp_frag_ref() - Increase fragment reference count of a page
> > > + * @page:      page of the fragment on which to increase a reference
> > > + *
> > > + * Increase the fragment reference count (pp_ref_count) of a page. This is
> > > + * intended to gain a fragment reference only for page pool aware skbs,
> > > + * i.e. when skb->pp_recycle is true, and not for fragments in a
> > > + * non-pp-recycling skb. It has a fallback to increase a reference on a
> > > + * normal page, as page pool aware skbs may also have normal page fragments.
> > > + */
> > > +static void skb_pp_frag_ref(struct page *page)
> > > +{
> > > +       struct page *head_page = compound_head(page);
> > > +
> >
> > Feel free to not delay this patch series further based on this
> > comment/question, but...
> >
> > I'm a bit confused about the need for compound_head() here, but
> > skb_frag_ref() doesn't first obtain the compound_head(). Is there a
> > page_pool specific reason why skb_frag_ref() can get_page() directly
> > but this helper needs to grab the compound_head() first?
> >
>
> get_page includes the call to compound_head, so skb_frag_ref
> indirectly calls compound_head as well.
>
> > > +       if (likely(is_pp_page(head_page)))
> > > +               page_pool_ref_page(head_page);
> > > +       else
> > > +               page_ref_inc(head_page);
> >
> > Any reason why not get_page() here?
> >
>
> head_page is a head page because of the compound_head call above. This
> was actually a comment received from a previous iteration:)
>

I see, thanks.

Reviewed-by: Mina Almasry <almasrymina@xxxxxxxxxx>

Noob question: do we actually support someone passing a compound_page
to skb_frag_fill_page_desc()? Anyone know of any driver that does
this? I kinda like the direction this patch was going instead:

https://patchwork.kernel.org/project/netdevbpf/patch/20231113130041.58124-5-linyunsheng@xxxxxxxxxx/

Where we explicitly exclude compound pages from skbs... This is for
convenience for devmem TCP, where I don't support compound pages, but
that is more my problem than yours. This patch is fine.

> > > +}
> > > +
> > >  static void skb_kfree_head(void *head, unsigned int end_offset)
> > >  {
> > >         if (end_offset == SKB_SMALL_HEAD_HEADROOM)
> > > @@ -5769,17 +5789,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> > >                 return false;
> > >
> > >         /* In general, avoid mixing page_pool and non-page_pool allocated
> > > -        * pages within the same SKB. Additionally avoid dealing with clones
> > > -        * with page_pool pages, in case the SKB is using page_pool fragment
> > > -        * references (page_pool_alloc_frag()). Since we only take full page
> > > -        * references for cloned SKBs at the moment that would result in
> > > -        * inconsistent reference counts.
> > > -        * In theory we could take full references if @from is cloned and
> > > -        * !@to->pp_recycle but its tricky (due to potential race with
> > > -        * the clone disappearing) and rare, so not worth dealing with.
> > > +        * pages within the same SKB. In theory we could take full
> > > +        * references if @from is cloned and !@to->pp_recycle but its
> > > +        * tricky (due to potential race with the clone disappearing) and
> > > +        * rare, so not worth dealing with.
> > >          */
> > > -       if (to->pp_recycle != from->pp_recycle ||
> > > -           (from->pp_recycle && skb_cloned(from)))
> > > +       if (to->pp_recycle != from->pp_recycle)
> > >                 return false;
> > >
> > >         if (len <= skb_tailroom(to)) {
> > > @@ -5836,8 +5851,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> > >         /* if the skb is not cloned this does nothing
> > >          * since we set nr_frags to 0.
> > >          */
> > > -       for (i = 0; i < from_shinfo->nr_frags; i++)
> > > -               __skb_frag_ref(&from_shinfo->frags[i]);
> > > +       if (from->pp_recycle)
> > > +               for (i = 0; i < from_shinfo->nr_frags; i++)
> > > +                       skb_pp_frag_ref(skb_frag_page(&from_shinfo->frags[i]));
> > > +       else
> > > +               for (i = 0; i < from_shinfo->nr_frags; i++)
> > > +                       __skb_frag_ref(&from_shinfo->frags[i]);
> > >
> > >         to->truesize += delta;
> > >         to->len += len;
> > > --
> > > 2.31.1
> > >
> >
> >
> > --
> > Thanks,
> > Mina

-- 
Thanks,
Mina