On Tue, Jan 11, 2022 at 06:53:06PM -0400, Jason Gunthorpe wrote: > IOMMU is not common in those cases, it is slow. > > So you end up with 16 bytes per entry then another 24 bytes in the > entirely redundant scatter list. That is now 40 bytes/page for typical > HPC case, and I can't see that being OK. Ah, I didn't realise what case you wanted to optimise for. So, how about this ... Since you want to get to the same destination as I do (a 16-byte-per-entry dma_addr+dma_len struct), but need to get there sooner than "make all sg users stop using it wrongly", let's introduce a (hopefully temporary) "struct dma_range". But let's go further than that (which only brings us to 32 bytes per range). For the systems you care about which use an identity mapping, and have sizeof(dma_addr_t) == sizeof(phys_addr_t), we can simply point the dma_range pointer to the same memory as the phyr. We just have to not free it too early. That gets us down to 16 bytes per range, a saving of 33%.