On Tue, 2017-04-18 at 17:21 -0600, Jason Gunthorpe wrote: > Splitting the sgl is different from iommu batching. > > As an example, an O_DIRECT write of 1 MB with a single 4K P2P page in > the middle. > > The optimum behavior is to allocate a 1MB-4K iommu range and fill it > with the CPU memory. Then return a SGL with three entires, two > pointing into the range and one to the p2p. > > It is creating each range which tends to be expensive, so creating > two > ranges (or worse, if every SGL created a range it would be 255) is > very undesired. I think it's easier to get us started to just use a helper and stick it in the existing sglist processing loop of the architecture. As we noticed, stacking dma_ops is actually non-trivial and opens quite the can of worms. As Jerome mentioned, you can end up with IOs ops containing an sglist that is a collection of memory and GPU pages for example. Cheers, Ben.