Hi, * Problem Statement :- ------------------------------------------------------------------------- The existing IOMMU DMA mapping operation is performed in two steps at the same time (one-shot): 1. Allocates IOVA space. 2. Actually maps DMA pages to that space. For example, map scatter-gather list: dma_map_sg_attrs() __dma_map_sg_attrs ops->map_sg() iommu_dma_map_sg() Calculate length of IOVA space that is needed /* ####### step one allocate IOVA space ####### */ iommu_dma_alloc_iova() /* ####### step two actually map DMA Pages ####### */ iommu_map_sg() for each entry in sg list() __iommu_map() iommu_domain_ops->map_pages() This one-shot operation works perfectly for non-complex scenarios where callers use the existing DMA API in the control path when they setup hardware. However, in more complex scenarios, when DMA mapping is needed in the data path and especially when some sort of specific intermediary datatype is involved (sg list), this one-shot approach: 1. Forces developers to introduce new DMA APIs for specific datatype, e.g., Existing scatter-gather mapping functions in dma mapping existing subsystems :- dma_map_sgtable() __dma_map_sg_attrs() dma_unmap_sg_attrs() blk_rq_map_sg() __blk_rq_map_sg() __blk_bvec_map_sg() __blk_bios_map_sg() blk_bvec_map_sg() OR Latest Chuck's RFC series [1] aims to incorporate biovec-related DMA mapping (which expands bio_vec with DMA addresses). Probably, struct folio will also require it. 2. Creates dependencies on a data type, forcing certain intermediary data type allocation/de-allocation and page-to-data-type mapping and unmapping in the fast path (submission or completion). * Proposed approach and discussion points :- ------------------------------------------------------------------------- Instead of teaching DMA APIs to know about specific datatypes & creating a dependency on it, that may add performance overhead with mapping and allocation, we propose to separate the existing DMA mapping routine into two steps where: Step 1 : Provide an option to API users (subsystems) to perform all calculations internally in-advance. Step 2 : Map pages when they are needed. These advanced DMA mapping APIs are needed to calculate the IOVA size to allocate as one chunk and a combination of offset calculations to know which part of IOVA to be mapped to which page. The new API will also allow us to remove the dependency on the sg list as discussed previously in [2]. The main advantages of this approach as it is seen in upcoming RFC series are: 1. Simplified & increased performance in page fault handling for On-Demand-Paging (ODP) mode for RDMA. 2. Reduced memory footprint for VFIO PCI live migration code. 3. Reduced overhead of intermediary SG table manipulation in the fast path for storage drivers where block layer requests are mapped onto sg table and then sg table is mapped onto DMA :- xxx_queue_rq() allocate sg table blk_rq_map_sg() merge and maps bvecs to sg dma_map_sgtable() maps pages in sg to DMA. In order to create a good platform for a concrete and meaningful discussion at LSFMM 24, we plan to post an RFC within the next two weeks. Required Attendees list :- Christoph Hellwig Jason Gunthorpe Jens Axboe Chuck Lever David Howells Keith Busch Bart Van Assche Damien Le Moal Martin Petersen -ck [1] https://lore.kernel.org/all/169772852492.5232.17148564580779995849.stgit@xxxxxxxxxxxxxxxxxxxxx [2] https://lore.kernel.org/linux-iommu/20200708065014.GA5694@xxxxxx/