I finally found some time to look over this, and I have some very high level problems with the implementations: - the DMA streaming ops (dma_map_*) are only intended for short term mappings, not for long living ones that can lead to starvation. Especially with swiotlb, but also with some IOMMUs. I think this needs to be changed to an API that can allocate DMA mapped memory using dma_alloc_noncoherent/dma_alloc_noncontigious for the device and then give access to that to the user instead - I really do not like the special casing in the bio. Did you try to just stash away the DMA mapping information in an efficient lookup data structure (e.g. rthashtable, but details might need analysis and benchmarking) and thus touch very little code outside of the driver I/O path and the place that performs the mapping? - the design seems to ignore DMA ownership. Every time data in transfered data needs to be transferred to and from the device, take a look at Documentation/core-api/dma-api.rst and Documentation/core-api/dma-api-howto.rst. As for the multiple devices discussion: mapping memory for multiple devices is possible, but nontrivial to get right, mostly due to the ownership. So unless we have a really good reason I'd suggest to not support this initially.