On Tue, Dec 10, 2024 at 7:47 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > On Mon, 9 Dec 2024 17:23:07 +0000 Mina Almasry wrote: > > -static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool, > > - const struct page *page, > > - u32 offset, u32 dma_sync_size) > > +static inline void > > +page_pool_dma_sync_netmem_for_cpu(const struct page_pool *pool, > > + const netmem_ref netmem, u32 offset, > > + u32 dma_sync_size) > > { > > + if (pool->mp_priv) > > Let's add a dedicated bit to skip sync. The io-uring support feels > quite close. Let's not force those guys to have to rejig this. > OK. > > + return; > > + > > dma_sync_single_range_for_cpu(pool->p.dev, > > - page_pool_get_dma_addr(page), > > + page_pool_get_dma_addr_netmem(netmem), > > offset + pool->p.offset, dma_sync_size, > > page_pool_get_dma_dir(pool)); > > } > > > > +static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool, > > + struct page *page, u32 offset, > > + u32 dma_sync_size) > > +{ > > + page_pool_dma_sync_netmem_for_cpu(pool, page_to_netmem(page), offset, > > + dma_sync_size); > > I have the feeling Olek won't thank us for this extra condition and > bit clearing. If driver calls page_pool_dma_sync_for_cpu() we don't > have to check the new bit / mp_priv. Let's copy & paste the > dma_sync_single_range_for_cpu() call directly here. page_pool_get_dma_addr() also does a cast to netmem and bit clearing :/ The whole netmem stuff was written to maximize code reuse. We don't really special case pages for performance, we convert pages to netmem then pipe them to through common code paths. I can special case pages here but we would also need to copy the implementation of page_pool_get_dma_addr() as well. But note the tradeoff is some code duplication. Seems from the discussions it's worth it which is fine by me. -- Thanks, Mina