On Mon, Mar 18, 2024, at 16:13, Niklas Cassel wrote: > On Mon, Mar 18, 2024 at 08:25:36AM +0100, Arnd Bergmann wrote: > > I personally just care about pci-epf-test, but obviously I don't > want to regress any other user of pci_epf_alloc_space(). > > Looking at the endpoint side driver: > drivers/pci/endpoint/functions/pci-epf-test.c > and the host side driver: > drivers/misc/pci_endpoint_test.c > > On the RC side, allocating buffers that the EP will DMA to is > done using: kzalloc() + dma_map_single(). > > On EP side: > drivers/pci/endpoint/functions/pci-epf-test.c > uses dma_map_single() when using DMA, and signals completion using MSI. > > On EP side: > When reading/writing to the BARs, it simply does: > READ_ONCE()/WRITE_ONCE(): > https://github.com/torvalds/linux/blob/v6.8/drivers/pci/endpoint/functions/pci-epf-test.c#L643-L648 > > There is no dma_sync(), so the pci-test-epf driver currently seems to > depend on the backing memory being allocated by dma_alloc_coherent(). >From my reading of that function, this is really some kind of command buffer that implements individual structured registers and can be accessed from both sides at the same time, so it would not actually make sense with the streaming interface and wc/prefetchable access in place of explicit READ_ONCE/WRITE_ONCE and readl/writel accesses. >> If you don't care about ordering on that level, I would use >> dma_map_sg() on the endpoint side and prefetchable mapping on >> the host side, with the endpoint using dma_sync_*() to pass >> buffer ownership between the two sides, as controlled by some >> other communication method (non-prefetchable BAR, MSI, ...). > > I don't think that there is no big reason why pci-epf-test is > implemented using dma_alloc_coherent() rather than dma_sync() > for the memory backing the BARs, but that is the way it is. > > Since I don't feel like totally rewriting pci-epf-test, and since > you say that we shouldn't use dma_alloc_coherent() for the memory > backing the BARs together with exporting the BAR as prefetchable, > I will drop this patch from the series in the next revision. Ok. It might still be useful to extend the driver to also allow transferring streaming data through a BAR on the endpoint side. From what I can tell, it currently supports using either slave DMA or a RC side buffer that ioremapped into the endpoint, but that uses a regular ioremap() as well. Mapping the RC side buffer as WC should make it possible to transfer data from EP to RC more efficiently, but for the RC to EP transfers you really want the buffer to be allocated on the EP, so you can ioremap_wc() it to the RC for a memcpy_toio, or cacheable read from the EP. Arnd