From: Robin Murphy <robin.murphy@xxxxxxx> Date: Thu, 2 May 2024 16:49:48 +0100 > On 24/04/2024 9:52 am, Alexander Lobakin wrote: >> From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> >> Date: Tue, 23 Apr 2024 15:58:31 +0200 >> >>> We can save a couple more function calls in the Page Pool code if we >>> check for dma_need_sync() earlier, just when we test pp->p.dma_sync. >>> Move both these checks into an inline wrapper and call the PP wrapper >>> over the generic DMA sync function only when both are true. >>> You can't cache the result of dma_need_sync() in &page_pool, as it may >>> change anytime if an SWIOTLB buffer is allocated or mapped. >>> >>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> >>> --- >>> net/core/page_pool.c | 31 +++++++++++++++++-------------- >>> 1 file changed, 17 insertions(+), 14 deletions(-) >>> >>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c >>> index 6cf26a68fa91..87319c6365e0 100644 >>> --- a/net/core/page_pool.c >>> +++ b/net/core/page_pool.c >>> @@ -398,16 +398,24 @@ static struct page >>> *__page_pool_get_cached(struct page_pool *pool) >>> return page; >>> } >>> -static void page_pool_dma_sync_for_device(const struct page_pool >>> *pool, >>> - const struct page *page, >>> - unsigned int dma_sync_size) >>> +static void __page_pool_dma_sync_for_device(const struct page_pool >>> *pool, >>> + const struct page *page, >>> + u32 dma_sync_size) >>> { >>> dma_addr_t dma_addr = page_pool_get_dma_addr(page); >>> dma_sync_size = min(dma_sync_size, pool->p.max_len); >>> - dma_sync_single_range_for_device(pool->p.dev, dma_addr, >>> - pool->p.offset, dma_sync_size, >>> - pool->p.dma_dir); >>> + __dma_sync_single_for_device(pool->p.dev, dma_addr + >>> pool->p.offset, >>> + dma_sync_size, pool->p.dma_dir); >> >> Breh, this breaks builds on !DMA_NEED_SYNC systems, as this function >> isn't defined there, and my CI didn't catch it in time... I'll ifdef >> this in the next spin after some reviews for this one. > > Hmm, the way things have ended up, do we even need this change? (Other > than factoring out the pool->dma_sync check, which is clearly nice) > > Since __page_pool_dma_sync_for_device() is never called directly, the > flow we always get is: > > page_pool_dma_sync_for_device() > dma_dev_need_sync() > __page_pool_dma_sync_for_device() > ... // a handful of trivial arithmetic > __dma_sync_single_for_device() > > i.e. still effectively the same order of > "if (dma_dev_need_sync()) __dma_sync()" invocations as if we'd just used > the standard dma_sync(), so referencing the unwrapped internals only > spreads it out a bit more for no real benefit. As an experiment I tried > the diff below on top, effectively undoing this problematic part, and it > doesn't seem to make any appreciable difference in a before-and-after > comparison of the object code, at least for my arm64 build. > > Thanks, > Robin. > > ----->8----- > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index 27f3b6db800e..b8ab7d4ca229 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -398,24 +398,20 @@ static struct page *__page_pool_get_cached(struct > page_pool *pool) > return page; > } > > -static void __page_pool_dma_sync_for_device(const struct page_pool *pool, > - const struct page *page, > - u32 dma_sync_size) > -{ > - dma_addr_t dma_addr = page_pool_get_dma_addr(page); > - > - dma_sync_size = min(dma_sync_size, pool->p.max_len); > - __dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset, > - dma_sync_size, pool->p.dma_dir); > -} > - > static __always_inline void So this would unconditionally inline the whole sync code into the call sites, while I wanted to give a chance for the compilers to uninline __page_pool_dma_sync_for_device(). > page_pool_dma_sync_for_device(const struct page_pool *pool, > const struct page *page, > u32 dma_sync_size) > { > - if (pool->dma_sync && dma_dev_need_sync(pool->p.dev)) > - __page_pool_dma_sync_for_device(pool, page, dma_sync_size); > + dma_addr_t dma_addr = page_pool_get_dma_addr(page); > + > + if (!pool->dma_sync) > + return; > + > + dma_sync_size = min(dma_sync_size, pool->p.max_len); > + dma_sync_single_range_for_device(pool->p.dev, dma_addr, > + pool->p.offset, dma_sync_size, > + pool->p.dma_dir); > } > > static bool page_pool_dma_map(struct page_pool *pool, struct page *page) Thanks, Olek