On Tue, Oct 8, 2024 at 11:57 AM David Wei <dw@xxxxxxxxxxx> wrote: > > On 2024-10-04 03:34, Taehee Yoo wrote: > > On Fri, Oct 4, 2024 at 3:43 AM Mina Almasry <almasrymina@xxxxxxxxxx> wrote: > >>> @@ -3608,9 +3629,11 @@ static void bnxt_free_rx_rings(struct bnxt *bp) > >>> > >>> static int bnxt_alloc_rx_page_pool(struct bnxt *bp, > >>> struct bnxt_rx_ring_info *rxr, > >>> + int queue_idx, > >>> int numa_node) > >>> { > >>> struct page_pool_params pp = { 0 }; > >>> + struct netdev_rx_queue *rxq; > >>> > >>> pp.pool_size = bp->rx_agg_ring_size; > >>> if (BNXT_RX_PAGE_MODE(bp)) > >>> @@ -3621,8 +3644,15 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp, > >>> pp.dev = &bp->pdev->dev; > >>> pp.dma_dir = bp->rx_dir; > >>> pp.max_len = PAGE_SIZE; > >>> - pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV; > >>> + pp.order = 0; > >>> + > >>> + rxq = __netif_get_rx_queue(bp->dev, queue_idx); > >>> + if (rxq->mp_params.mp_priv) > >>> + pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_ALLOW_UNREADABLE_NETMEM; > >> > >> This is not the intended use of PP_FLAG_ALLOW_UNREADABLE_NETMEM. > >> > >> The driver should set PP_FLAG_ALLOW_UNREADABLE_NETMEM when it's able > >> to handle unreadable netmem, it should not worry about whether > >> rxq->mp_params.mp_priv is set or not. > >> > >> You should set PP_FLAG_ALLOW_UNREADABLE_NETMEM when HDS is enabled. > >> Let core figure out if mp_params.mp_priv is enabled. All the driver > >> needs to report is whether it's configured to be able to handle > >> unreadable netmem (which practically means HDS is enabled). > > > > The reason why the branch exists here is the PP_FLAG_ALLOW_UNREADABLE_NETMEM > > flag can't be used with PP_FLAG_DMA_SYNC_DEV. > > > > 228 if (pool->slow.flags & PP_FLAG_DMA_SYNC_DEV) { > > 229 /* In order to request DMA-sync-for-device the page > > 230 * needs to be mapped > > 231 */ > > 232 if (!(pool->slow.flags & PP_FLAG_DMA_MAP)) > > 233 return -EINVAL; > > 234 > > 235 if (!pool->p.max_len) > > 236 return -EINVAL; > > 237 > > 238 pool->dma_sync = true; //here > > 239 > > 240 /* pool->p.offset has to be set according to the address > > 241 * offset used by the DMA engine to start copying rx data > > 242 */ > > 243 } > > > > If PP_FLAG_DMA_SYNC_DEV is set, page->dma_sync is set to true. > > > > 347 int mp_dmabuf_devmem_init(struct page_pool *pool) > > 348 { > > 349 struct net_devmem_dmabuf_binding *binding = pool->mp_priv; > > 350 > > 351 if (!binding) > > 352 return -EINVAL; > > 353 > > 354 if (!pool->dma_map) > > 355 return -EOPNOTSUPP; > > 356 > > 357 if (pool->dma_sync) //here > > 358 return -EOPNOTSUPP; > > 359 > > 360 if (pool->p.order != 0) > > 361 return -E2BIG; > > 362 > > 363 net_devmem_dmabuf_binding_get(binding); > > 364 return 0; > > 365 } > > > > In the mp_dmabuf_devmem_init(), it fails when pool->dma_sync is true. > > This won't work for io_uring zero copy into user memory. We need all > PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | PP_FLAG_ALLOW_UNREADABLE_NETMEM > set. > > I agree with Mina that the driver should not be poking at the mp_priv > fields. How about setting all the flags and then letting the mp->init() > figure it out? mp_dmabuf_devmem_init() is called within page_pool_init() > so as long as it resets dma_sync if set I don't see any issues. > Ah, I haven't thought the failure of PP_FLAG_DMA_SYNC_DEV for dmabuf may be wrong. IIUC this flag indicates sync between device and CPU. But device memory TCP is not related to sync between device and CPU. So, I think we need to remove this condition check code in the core. How do you think about it? > > > > tcp-data-split can be used for normal cases, not only devmem TCP case. > > If we enable tcp-data-split and disable devmem TCP, page_pool doesn't > > have PP_FLAG_DMA_SYNC_DEV. > > So I think mp_params.mp_priv is still useful. > > > > Thanks a lot, > > Taehee Yoo > > > >> > >> > >> -- > >> Thanks, > >> Mina