On 2021/7/11 1:43, Alexander Duyck wrote: > On Sat, Jul 10, 2021 at 12:44 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: >> >> Currently each desc use a whole page to do ping-pong page >> reusing in most driver. As the page pool has support page >> recycling based on elevated refcnt, it makes sense to add >> a page frag API in page pool to split a page to different >> frag to serve multi descriptions. >> >> This means a huge memory saving for kernel with page size of >> 64K, as a page can be used by 32 descriptions with 2k buffer >> size, comparing to each desc using one page currently. >> >> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> >> --- >> include/net/page_pool.h | 14 ++++++++++++++ >> net/core/page_pool.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 63 insertions(+) >> >> diff --git a/include/net/page_pool.h b/include/net/page_pool.h >> index f0e708d..06a5e43 100644 >> --- a/include/net/page_pool.h >> +++ b/include/net/page_pool.h >> @@ -80,6 +80,7 @@ struct page_pool_params { >> enum dma_data_direction dma_dir; /* DMA mapping direction */ >> unsigned int max_len; /* max DMA sync memory size */ >> unsigned int offset; /* DMA addr offset */ >> + unsigned int frag_size; >> }; >> >> struct page_pool { >> @@ -91,6 +92,8 @@ struct page_pool { >> unsigned long defer_warn; >> >> u32 pages_state_hold_cnt; >> + unsigned int frag_offset; >> + struct page *frag_page; >> >> /* >> * Data structure for allocation side >> @@ -140,6 +143,17 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool) >> return page_pool_alloc_pages(pool, gfp); >> } >> >> +struct page *page_pool_alloc_frag(struct page_pool *pool, >> + unsigned int *offset, gfp_t gfp); >> + >> +static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool, >> + unsigned int *offset) >> +{ >> + gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN); >> + >> + return page_pool_alloc_frag(pool, offset, gfp); >> +} >> + >> /* get the stored dma direction. A driver might decide to treat this locally and >> * avoid the extra cache line from page_pool to determine the direction >> */ >> diff --git a/net/core/page_pool.c b/net/core/page_pool.c >> index a87cbe1..b787033 100644 >> --- a/net/core/page_pool.c >> +++ b/net/core/page_pool.c >> @@ -350,6 +350,53 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp) >> } >> EXPORT_SYMBOL(page_pool_alloc_pages); >> >> +struct page *page_pool_alloc_frag(struct page_pool *pool, >> + unsigned int *offset, gfp_t gfp) >> +{ >> + unsigned int frag_offset = pool->frag_offset; >> + unsigned int frag_size = pool->p.frag_size; >> + struct page *frag_page = pool->frag_page; >> + unsigned int max_len = pool->p.max_len; >> + >> + if (!frag_page || frag_offset + frag_size > max_len) { >> + frag_page = page_pool_alloc_pages(pool, gfp); >> + if (unlikely(!frag_page)) { >> + pool->frag_page = NULL; >> + return NULL; >> + } >> + >> + pool->frag_page = frag_page; >> + frag_offset = 0; >> + >> + page_pool_sub_bias(pool, frag_page, >> + max_len / frag_size - 1); >> + } >> + >> + *offset = frag_offset; >> + pool->frag_offset = frag_offset + frag_size; >> + >> + return frag_page; >> +} >> +EXPORT_SYMBOL(page_pool_alloc_frag); > > I'm still not a fan of the fixed implementation. For the cost of the > division as I said before you could make this flexible like > page_frag_alloc_align and just decrement the bias by one per > allocation instead of trying to batch it. > > I'm sure there would likely be implementations that might need to > operate at two different sizes, for example a header and payload size. Will try to implement the frag allocation of different sizes in new version. > >> +static void page_pool_empty_frag(struct page_pool *pool) >> +{ >> + unsigned int frag_offset = pool->frag_offset; >> + unsigned int frag_size = pool->p.frag_size; >> + struct page *frag_page = pool->frag_page; >> + unsigned int max_len = pool->p.max_len; >> + >> + if (!frag_page) >> + return; >> + >> + while (frag_offset + frag_size <= max_len) { >> + page_pool_put_full_page(pool, frag_page, false); >> + frag_offset += frag_size; >> + } >> + > > Having to call this to free the page seems confusing. Rather than > reserving multiple and having to free the page multiple times I really > think you would be better off just holding one bias reservation on the > page at a time. will remove the above freeing the page multiple times. > >> + pool->frag_page = NULL; >> +} >> + >> /* Calculate distance between two u32 values, valid if distance is below 2^(31) >> * https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution >> */ >> @@ -670,6 +717,8 @@ void page_pool_destroy(struct page_pool *pool) >> if (!page_pool_put(pool)) >> return; >> >> + page_pool_empty_frag(pool); >> + >> if (!page_pool_release(pool)) >> return; >> >> -- >> 2.7.4 >> > . >