On Mon, Mar 25, 2024 at 12:22:15AM +0100, Christoph Hellwig wrote: > On Fri, Mar 22, 2024 at 03:43:30PM -0300, Jason Gunthorpe wrote: > > If we are going to make caller provided uniformity a requirement, lets > > imagine a formal memory type idea to help keep this a little > > abstracted? > > > > DMA_MEMORY_TYPE_NORMAL > > DMA_MEMORY_TYPE_P2P_NOT_ACS > > DMA_MEMORY_TYPE_ENCRYPTED > > DMA_MEMORY_TYPE_BOUNCE_BUFFER // ?? > > > > Then maybe the driver flow looks like: > > > > if (transaction.memory_type == DMA_MEMORY_TYPE_NORMAL && dma_api_has_iommu(dev)) { > > Add a nice helper to make this somewhat readable, but yes. > > > } else if (transaction.memory_type == DMA_MEMORY_TYPE_P2P_NOT_ACS) { > > num_hwsgls = transcation.num_sgls; > > for_each_range(transaction, range) { > > hwsgl[i].addr = dma_api_p2p_not_acs_map(range.start_physical, range.length, p2p_memory_provider); > > hwsgl[i].len = range.size; > > } > > } else { > > /* Must be DMA_MEMORY_TYPE_NORMAL, DMA_MEMORY_TYPE_ENCRYPTED, DMA_MEMORY_TYPE_BOUNCE_BUFFER? */ > > num_hwsgls = transcation.num_sgls; > > for_each_range(transaction, range) { > > hwsgl[i].addr = dma_api_map_cpu_page(range.start_page, range.length); > > hwsgl[i].len = range.size; > > } > > > > And these two are really the same except that we call a different map > helper underneath. So I think as far as the driver is concerned > they should be the same, the DMA API just needs to key off the > memory tap. Yeah.. If the caller is going to have compute the memory type of the range then lets pass it to the helper dma_api_map_memory_type(transaction.memory_type, range.start_page, range.length); Then we can just hide all the differences under the API without doing duplicated work. Function names need some work ... > > > > So I take it as a requirement that RDMA MUST make single MR's out of a > > > > hodgepodge of page types. RDMA MRs cannot be split. Multiple MR's are > > > > not a functional replacement for a single MR. > > > > > > But MRs consolidate multiple dma addresses anyway. > > > > I'm not sure I understand this? > > The RDMA MRs take a a list of PFNish address, (or SGLs with the > enhanced MRs from Mellanox) and give you back a single rkey/lkey. Yes, that is the desire. > > To go back to my main thesis - I would like a high performance low > > level DMA API that is capable enough that it could implement > > scatterlist dma_map_sg() and thus also implement any future > > scatterlist_v2, bio, hmm_range_fault or any other thing we come up > > with on top of it. This is broadly what I thought we agreed to at LSF > > last year. > > I think the biggest underlying problem of the scatterlist based > DMA implementation for IOMMUs is that it's trying to handle to much, > that is magic coalescing even if the segments boundaries don't align > with the IOMMU page size. If we can get rid of that misfeature I > think we'd greatly simply the API and implementation. Yeah, that stuff is not easy at all and takes extra computation to figure out. I always assumed it was there for block... Leon & Chaitanya will make a RFC v2 along these lines, lets see how it goes. Thanks, Jason