On Mon, Mar 18, 2024 at 07:44:21AM +0100, Arnd Bergmann wrote: > On Mon, Mar 18, 2024, at 05:30, Manivannan Sadhasivam wrote: > > On Fri, Mar 15, 2024 at 06:29:52PM +0100, Arnd Bergmann wrote: > >> On Fri, Mar 15, 2024, at 07:44, Manivannan Sadhasivam wrote: > >> > On Wed, Mar 13, 2024 at 11:58:01AM +0100, Niklas Cassel wrote: > > > > But I'm not sure I got the answer I was looking for. So let me rephrase my > > question a bit. > > > > For BAR memory, PCIe spec states that, > > > > 'A PCI Express Function requesting Memory Space through a BAR must set the BAR's > > Prefetchable bit unless the range contains locations with read side effects or > > locations in which the Function does not tolerate write merging' > > > > So here, spec refers the backing memory allocated on the endpoint side as the > > 'range' i.e, the BAR memory allocated on the host that gets mapped on the > > endpoint. > > > > Currently on the endpoint side, we use dma_alloc_coherent() to allocate the > > memory for each BAR and map it using iATU. > > > > So I want to know if the memory range allocated in the endpoint through > > dma_alloc_coherent() satisfies the above two conditions in PCIe spec on all > > architectures: > > > > 1. No Read side effects > > 2. Tolerates write merging > > > > I believe the reason why we are allocating the coherent memory on the endpoint > > first up is not all PCIe controllers are DMA coherent as you said above. > > As far as I can tell, we never have read side effects for memory > backed BARs, but the write merging is something that depends on > how the memory is used: > > If you have anything in that memory that relies on ordering, > you probably want to map it as coherent on the endpoint side, > and non-prefetchable on the host controller side, and then > use the normal rmb()/wmb() barriers on both ends between > serialized accesses. An example of this would be having blocks > of data separate from metadata that says whether the data is > valid. > > If you don't care about ordering on that level, I would use > dma_map_sg() on the endpoint side and prefetchable mapping on > the host side, with the endpoint using dma_sync_*() to pass > buffer ownership between the two sides, as controlled by some > other communication method (non-prefetchable BAR, MSI, ...). > Right now, there are only Test and a couple of NTB drivers making use of the pci_epf_alloc_space() API and they do not need streaming DMA. So to conclude, we should just live with coherent allocation/non-prefetch for now and extend it to streaming DMA/prefetch once we have a function driver that needs it. Thanks a lot for your inputs! - Mani -- மணிவண்ணன் சதாசிவம்