On Wed, Mar 06, 2024 at 03:33:21PM +0100, Christoph Hellwig wrote: > On Tue, Mar 05, 2024 at 08:51:56AM -0700, Keith Busch wrote: > > On Tue, Mar 05, 2024 at 01:18:47PM +0200, Leon Romanovsky wrote: > > > @@ -236,7 +236,9 @@ struct nvme_iod { > > > unsigned int dma_len; /* length of single DMA segment mapping */ > > > dma_addr_t first_dma; > > > dma_addr_t meta_dma; > > > - struct sg_table sgt; > > > + struct dma_iova_attrs iova; > > > + dma_addr_t dma_link_address[128]; > > > + u16 nr_dma_link_address; > > > union nvme_descriptor list[NVME_MAX_NR_ALLOCATIONS]; > > > }; > > > > That's quite a lot of space to add to the iod. We preallocate one for > > every request, and there could be millions of them. > > Yes. And this whole proposal also seems clearly confused (not just > because of the gazillion reposts) but because it mixes up the case > where we can coalesce CPU regions into a single dma_addr_t range > (iommu and maybe in the future swiotlb) and one where we need a I had the broad expectation that the DMA API user would already be providing a place to store the dma_addr_t as it has to feed that into the HW. That memory should simply last up until we do dma unmap and the cases that need dma_addr_t during unmap can go get it from there. If that is how things are organized, is there another reason to lean further into single-range case optimization? We can't do much on the map side as single range doesn't imply contiguous range, P2P and alignment create discontinuities in the dma_addr_t that still have to be delt with. Jason