On 2024-10-15 07:29, Pavel Begunkov wrote: > On 10/14/24 23:38, Mina Almasry wrote: >> On Sat, Oct 12, 2024 at 2:42 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: >>> >>> On Fri, Oct 11, 2024 at 10:33:43AM -0700, Mina Almasry wrote: >>>> On Thu, Oct 10, 2024 at 6:34 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: >>>>> >>>>> On Thu, 10 Oct 2024 10:44:38 -0700 Mina Almasry wrote: >>>>>>>> I haven't thought the failure of PP_FLAG_DMA_SYNC_DEV >>>>>>>> for dmabuf may be wrong. >>>>>>>> I think device memory TCP is not related to this flag. >>>>>>>> So device memory TCP core API should not return failure when >>>>>>>> PP_FLAG_DMA_SYNC_DEV flag is set. >>>>>>>> How about removing this condition check code in device memory TCP core? >>>>>>> >>>>>>> I think we need to invert the check.. >>>>>>> Mina, WDYT? >>>>>> >>>>>> On a closer look, my feeling is similar to Taehee, >>>>>> PP_FLAG_DMA_SYNC_DEV should be orthogonal to memory providers. The >>>>>> memory providers allocate the memory and provide the dma-addr, but >>>>>> need not dma-sync the dma-addr, right? The driver can sync the >>>>>> dma-addr if it wants and the driver can delegate the syncing to the pp >>>>>> via PP_FLAG_DMA_SYNC_DEV if it wants. AFAICT I think the check should >>>>>> be removed, not inverted, but I could be missing something. >>>>> >>>>> I don't know much about dmabuf but it hinges on the question whether >>>>> doing DMA sync for device on a dmabuf address is : >>>>> - a good thing >>>>> - a noop >>>>> - a bad thing >>>>> >>>>> If it's a good thing or a noop - agreed. >>>>> >>>>> Similar question for the sync for CPU. >>>>> >>>>> I agree that intuitively it should be all fine. But the fact that dmabuf >>>>> has a bespoke API for accessing the memory by the CPU makes me worried >>>>> that there may be assumptions about these addresses not getting >>>>> randomly fed into the normal DMA API.. >>>> >>>> Sorry I'm also a bit unsure what is the right thing to do here. The >>>> code that we've been running in GVE does a dma-sync for cpu >>>> unconditionally on RX for dma-buf and non-dmabuf dma-addrs and we >>>> haven't been seeing issues. It never does dma-sync for device. >>>> >>>> My first question is why is dma-sync for device needed on RX path at >>>> all for some drivers in the first place? For incoming (non-dmabuf) >>>> data, the data is written by the device and read by the cpu, so sync >>>> for cpu is really what's needed. Is the sync for device for XDP? Or is >>>> it that buffers should be dma-syncd for device before they are >>>> re-posted to the NIC? >>>> >>>> Christian/Jason, sorry quick question: are >>>> dma_sync_single_for_{device|cpu} needed or wanted when the dma-addrs >>>> come from a dma-buf? Or these dma-addrs to be treated like any other >>>> with the normal dma_sync_for_{device|cpu} rules? >>> >>> Um, I think because dma-buf hacks things up and generates illegal >>> scatterlist entries with weird dma_map_resource() addresses for the >>> typical P2P case the dma sync API should not be used on those things. >>> >>> However, there is no way to know if the dma-buf has does this, and >>> there are valid case where the scatterlist is not ill formed and the >>> sync is necessary. >>> >>> We are getting soo close to being able to start fixing these API >>> issues in dmabuf, I hope next cylce we can begin.. Fingers crossed. >>> >>> From a CPU architecture perspective you do not need to cache flush PCI >>> MMIO BAR memory, and perhaps doing so be might be problematic on some >>> arches (???). But you do need to flush normal cachable CPU memory if >>> that is in the DMA buf. >>> >> >> Thanks Jason. In that case I agree with Jakub we should take in his change here: >> >> https://lore.kernel.org/netdev/20241009170102.1980ed1d@xxxxxxxxxx/ >> >> With this change the driver would delegate dma_sync_for_device to the >> page_pool, and the page_pool will skip it altogether for the dma-buf >> memory provider. > > Requiring ->dma_map should be common to all providers as page pool > shouldn't be dipping to net_iovs figuring out how to map them. However, > looking at this discussion seems that the ->dma_sync concern is devmem > specific and should be discarded by pp providers using dmabufs, i.e. in > devmem.c:mp_dmabuf_devmem_init(). Yes, that's my preference as well, see my earlier reply. >