On Mon, Mar 18, 2024 at 04:49:07PM +0100, Arnd Bergmann wrote: > On Mon, Mar 18, 2024, at 16:13, Niklas Cassel wrote: > > On Mon, Mar 18, 2024 at 08:25:36AM +0100, Arnd Bergmann wrote: > > > > I personally just care about pci-epf-test, but obviously I don't > > want to regress any other user of pci_epf_alloc_space(). > > > > Looking at the endpoint side driver: > > drivers/pci/endpoint/functions/pci-epf-test.c > > and the host side driver: > > drivers/misc/pci_endpoint_test.c > > > > On the RC side, allocating buffers that the EP will DMA to is > > done using: kzalloc() + dma_map_single(). > > > > On EP side: > > drivers/pci/endpoint/functions/pci-epf-test.c > > uses dma_map_single() when using DMA, and signals completion using MSI. > > > > On EP side: > > When reading/writing to the BARs, it simply does: > > READ_ONCE()/WRITE_ONCE(): > > https://github.com/torvalds/linux/blob/v6.8/drivers/pci/endpoint/functions/pci-epf-test.c#L643-L648 > > > > There is no dma_sync(), so the pci-test-epf driver currently seems to > > depend on the backing memory being allocated by dma_alloc_coherent(). > > From my reading of that function, this is really some kind > of command buffer that implements individual structured > registers and can be accessed from both sides at the same > time, so it would not actually make sense with the streaming > interface and wc/prefetchable access in place of explicit > READ_ONCE/WRITE_ONCE and readl/writel accesses. > Right. We should stick to the current implementation for now until a function driver with streaming DMA usecase comes in. - Mani > >> If you don't care about ordering on that level, I would use > >> dma_map_sg() on the endpoint side and prefetchable mapping on > >> the host side, with the endpoint using dma_sync_*() to pass > >> buffer ownership between the two sides, as controlled by some > >> other communication method (non-prefetchable BAR, MSI, ...). > > > > I don't think that there is no big reason why pci-epf-test is > > implemented using dma_alloc_coherent() rather than dma_sync() > > for the memory backing the BARs, but that is the way it is. > > > > Since I don't feel like totally rewriting pci-epf-test, and since > > you say that we shouldn't use dma_alloc_coherent() for the memory > > backing the BARs together with exporting the BAR as prefetchable, > > I will drop this patch from the series in the next revision. > > Ok. It might still be useful to extend the driver to also > allow transferring streaming data through a BAR on the > endpoint side. From what I can tell, it currently supports > using either slave DMA or a RC side buffer that ioremapped > into the endpoint, but that uses a regular ioremap() as well. > Mapping the RC side buffer as WC should make it possible to > transfer data from EP to RC more efficiently, but for the RC > to EP transfers you really want the buffer to be allocated on > the EP, so you can ioremap_wc() it to the RC for a memcpy_toio, > or cacheable read from the EP. > > Arnd -- மணிவண்ணன் சதாசிவம்