On Wed, Jan 15, 2025 at 10:32:34AM +0100, Christoph Hellwig wrote: > On Wed, Jan 15, 2025 at 09:55:29AM +0100, Simona Vetter wrote: > > I think for 90% of exporters pfn would fit, but there's some really funny > > ones where you cannot get a cpu pfn by design. So we need to keep the > > pfn-less interfaces around. But ideally for the pfn-capable exporters we'd > > have helpers/common code that just implements all the other interfaces. > > There is no way to have dma address without a PFN in Linux right now. > How would you generate them? That implies you have an IOMMU that can > generate IOVAs for something that doesn't have a physical address at > all. > > Or do you mean some that don't have pages associated with them, and > thus have pfn_valid fail on them? They still have a PFN, just not > one that is valid to use in most of the Linux MM. He is talking about private interconnect hidden inside clusters of devices. Ie the system may have many GPUs and those GPUs have their own private interconnect between them. It is not PCI, and packets don't transit through the CPU SOC at all, so the IOMMU is not involved. DMA can happen on that private interconnect, but from a Linux perspective it is not DMA API DMA, and the addresses used to describe it are not part of the CPU address space. The initiating device will have a way to choose which path the DMA goes through when setting up the DMA. Effectively if you look at one of these complex GPU systems you will have a physical bit of memory, say HBM memory located on the GPU. Then from an OS perspective we have a whole bunch of different representations/addresses of that very same memory. A Grace/Hopper system would have at least three different addresses (ZONE_MOVABLE, a PCI MMIO aperture, and a global NVLink address). Each different address effectively represents a different physical interconnect multipath, and an initiator may have three different routes/addresses available to reach the same physical target memory. Part of what DMABUF needs to do is pick which multi-path will be used between expoter/importer. So, the hack today has the DMABUF exporter GPU driver understand the importer is part of the private interconnect and then generate a scatterlist with a NULL sg_page, but a sg_dma_addr that encodes the private global address on the hidden interconnect. Somehow the importer knows this has happened and programs its HW to use the private path. Jason