On Wed, Mar 06, 2024 at 03:44:16PM +0100, Christoph Hellwig wrote: > Except that the flows are fundamentally different for the "can coalesce" > vs "can't coalesce" case. In the former we have one dma_addr_t range, > and in the latter as many as there are input vectors (this is ignoring > the weird iommu merging case where we we coalesce some but not all > segments, but I'd rather not have that in a new API). I don't think they are so fundamentally different, at least in our past conversations I never came out with the idea we should burden the driver with two different flows based on what kind of alignment the transfer happens to have. Certainly if we split the API to focus one API on doing only page-aligned transfers the aligned part does become a little. At least the RDMA drivers could productively use just a page aligned interface. But I didn't think this would make BIO users happy so never even thought about it.. > The total transfer size should just be passed in by the callers and > be known, and there should be no offset. The API needs the caller to figure out the total number of IOVA pages it needs, rounding up the CPU ranges to full aligned pages. That becomes the IOVA allocation. offset is something that arises to support non-aligned transfers. > So if we want to efficiently be able to handle these cases we need > two APIs in the driver and a good framework to switch between them. But, what does the non-page-aligned version look like? Doesn't it still look basically like this? And what is the actual difference if the input is aligned? The caller can assume it doesn't need to provide a per-range dma_addr_t during unmap. It still can't assume the HW programming will be linear due to the P2P !ACS support. And it still has to call an API per-cpu range to actually program the IOMMU. So are they really so different to want different APIs? That strikes me as a big driver cost. > I'd still prefer to wrap it with dma callers to handle things like > swiotlb and maybe Xen grant tables and to avoid the type confusion > between dma_addr_t and then untyped iova in the iommu layer, but > having this layer or not is probably worth a discussion. I'm surprised by the idea of random drivers reaching past dma-iommu.c and into the iommu layer to setup DMA directly on the DMA API's iommu_domain?? That seems like completely giving up on the DMA API abstraction to me. :( IMHO, it needs to be wrapped, the wrapper needs to do all the special P2P stuff, at a minimum. The wrapper should multiplex to all the non-iommu cases for the driver too. We still need to achieve some kind of abstraction here that doesn't bruden every driver with different code paths for each DMA back end! Don't we?? Jason