From: Jakub Kicinski <kuba@xxxxxxxxxx> Date: Wed, 14 Jun 2023 10:19:54 -0700 > On Mon, 12 Jun 2023 21:02:55 +0800 Yunsheng Lin wrote: >> struct page_pool_params pp_params = { >> - .flags = PP_FLAG_DMA_MAP | PP_FLAG_PAGE_FRAG | >> - PP_FLAG_DMA_SYNC_DEV, >> + .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV, >> .order = hns3_page_order(ring), > > Does hns3_page_order() set a good example for the users? > > static inline unsigned int hns3_page_order(struct hns3_enet_ring *ring) > { > #if (PAGE_SIZE < 8192) > if (ring->buf_size > (PAGE_SIZE / 2)) > return 1; > #endif > return 0; > } Oh lol, just what Intel drivers do. They don't have a pool to keep some bunch of pages (they can recycle a page only within its buffer), so in order to still recycle them, they allocate order-1 pages to be able to flip the halves >_< > > Why allocate order 1 pages for buffers which would fit in a single page? > I feel like this soft of heuristic should be built into the API itself. Offtop: I tested this series with IAVF: very little perf regression* (almost stddev) comparing to just 1-page-per-frame Page Pool series, but 21 Mb less RAM taken comparing to both "old" PP series and baseline, nice :D (+Cc David Christensen, he'll be glad to hear we're stopping eating 64Kb pages) * this might be caused by that in the previous version I was hardcoding truesize, but now it depends on what page_pool_alloc() returns. Same for Rx offset: it was always 0 previously, as every frame was placed at the start of page, now depends on how PP places** it. With MTU of 1500 and no XDP, two frames fit into one 4k page. With XDP on (increased headroom) or increased MTU, PP starts effectively do 1-frame-per-page with literally no changes in performance (increased RAM usage obviously -- I mean, it gets restored to the baseline numbers). ** BTW, instead of 2048 + 2048, I'm getting 1920 + 2176. Maybe the stack would be happier to see more consistent truesize for cache purposes. I'll try to play with it. Thanks, Olek