Re: [PATCH v5 net-nex 2/5] net: page_pool: add bulk support for ptr_ring

Jesper Dangaard Brouer <brouer@xxxxxxxxxx> · Wed, 11 Nov 2020 13:59:53 +0100

On Wed, 11 Nov 2020 11:43:31 +0100
Lorenzo Bianconi <lorenzo@xxxxxxxxxx> wrote:

> > Lorenzo Bianconi wrote:  
> > > Introduce the capability to batch page_pool ptr_ring refill since it is
> > > usually run inside the driver NAPI tx completion loop.
> > > 
> > > Suggested-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> > > Co-developed-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@xxxxxxxxxx>
> > > ---
> > >  include/net/page_pool.h | 26 ++++++++++++++++
> > >  net/core/page_pool.c    | 69 +++++++++++++++++++++++++++++++++++------
> > >  net/core/xdp.c          |  9 ++----
> > >  3 files changed, 87 insertions(+), 17 deletions(-)  
> > 
> > [...]
> >   
> > > +/* Caller must not use data area after call, as this function overwrites it */
> > > +void page_pool_put_page_bulk(struct page_pool *pool, void **data,
> > > +			     int count)
> > > +{
> > > +	int i, bulk_len = 0, pa_len = 0;
> > > +
> > > +	for (i = 0; i < count; i++) {
> > > +		struct page *page = virt_to_head_page(data[i]);
> > > +
> > > +		page = __page_pool_put_page(pool, page, -1, false);
> > > +		/* Approved for bulk recycling in ptr_ring cache */
> > > +		if (page)
> > > +			data[bulk_len++] = page;
> > > +	}
> > > +
> > > +	if (unlikely(!bulk_len))
> > > +		return;
> > > +
> > > +	/* Bulk producer into ptr_ring page_pool cache */
> > > +	page_pool_ring_lock(pool);
> > > +	for (i = 0; i < bulk_len; i++) {
> > > +		if (__ptr_ring_produce(&pool->ring, data[i]))
> > > +			data[pa_len++] = data[i];  
> > 
> > How about bailing out on the first error? bulk_len should be less than
> > 16 right, so should we really keep retying hoping ring->size changes?  
> 
> do you mean doing something like:
> 
> 	page_pool_ring_lock(pool);
> 	if (__ptr_ring_full(&pool->ring)) {
> 		pa_len = bulk_len;
> 		page_pool_ring_unlock(pool);
> 		goto out;
> 	}
> 	...
> out:
> 	for (i = 0; i < pa_len; i++) {
> 		...
> 	}

I think this is the change John is looking for:

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a06606f07df0..3093fe4e1cd7 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -424,7 +424,7 @@ EXPORT_SYMBOL(page_pool_put_page);
 void page_pool_put_page_bulk(struct page_pool *pool, void **data,
                             int count)
 {
-       int i, bulk_len = 0, pa_len = 0;
+       int i, bulk_len = 0;
        bool order0 = (pool->p.order == 0);
 
        for (i = 0; i < count; i++) {
@@ -448,17 +448,18 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
        page_pool_ring_lock(pool);
        for (i = 0; i < bulk_len; i++) {
                if (__ptr_ring_produce(&pool->ring, data[i]))
-                       data[pa_len++] = data[i];
+                       break; /* ring_full */
        }
        page_pool_ring_unlock(pool);
 
-       if (likely(!pa_len))
+       /* Hopefully all pages was return into ptr_ring */
+       if (likely(i == bulk_len))
                return;
 
-       /* ptr_ring cache full, free pages outside producer lock since
-        * put_page() with refcnt == 1 can be an expensive operation
+       /* ptr_ring cache full, free remaining pages outside producer lock
+        * since put_page() with refcnt == 1 can be an expensive operation
         */
-       for (i = 0; i < pa_len; i++)
+       for (; i < bulk_len; i++)
                page_pool_return_page(pool, data[i]);
 }
 EXPORT_SYMBOL(page_pool_put_page_bulk);


> I do not know if it is better or not since the consumer can run in
> parallel. @Jesper/Ilias: any idea?

Currently it is not very likely that the consumer runs in parallel, but
is it possible. (As you experienced on your testlab with mlx5, the
DMA-TX completion did run on another CPU, but I asked you to
reconfigure this to have it run on same CPU, as it is suboptimal).
When we (finally) support this memory type for SKBs it will be more
normal to happen.

But, John is right, for ptr_ring we should exit as soon the first
"produce" fails.  This is because I know how ptr_ring works internally.
The "consumer" will free slots in chunks of 16 slots, so it is not very
likely that a slot opens up.

> >   
> > > +	}
> > > +	page_pool_ring_unlock(pool);
> > > +
> > > +	if (likely(!pa_len))
> > > +		return;
> > > +
> > > +	/* ptr_ring cache full, free pages outside producer lock since
> > > +	 * put_page() with refcnt == 1 can be an expensive operation
> > > +	 */
> > > +	for (i = 0; i < pa_len; i++)
> > > +		page_pool_return_page(pool, data[i]);
> > > +}
> > > +EXPORT_SYMBOL(page_pool_put_page_bulk);
> > > +  
> > 
> > Otherwise LGTM.  



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer