Jason Gunthorpe <jgg@xxxxxxxx> writes: > On Fri, Dec 04, 2020 at 03:05:46PM -0500, Daniel Jordan wrote: >> Well Alex can correct me, but I went digging and a comment from the >> first type1 vfio commit says the iommu API didn't promise to unmap >> subpages of previous mappings, so doing page at a time gave flexibility >> at the cost of inefficiency. > > iommu restrictions are not related to with gup. vfio needs to get the > page list from the page tables as efficiently as possible, then you > break it up into what you want to feed into the IOMMU how the iommu > wants. > > vfio must maintain a page list to call unpin_user_pages() anyhow, so It does in some cases but not others, namely the expensive VFIO_IOMMU_MAP_DMA/UNMAP_DMA path where the iommu page tables are used to find the pfns when unpinning. I don't see why vfio couldn't do as you say, though, and the worst case memory overhead of using scatterlist to remember the pfns of a 300g VM backed by huge but physically discontiguous pages is only a few meg, not bad at all. > it makes alot of sense to assemble the page list up front, then do the > iommu, instead of trying to do both things page at a time. > > It would be smart to rebuild vfio to use scatter lists to store the > page list and then break the sgl into pages for iommu > configuration. SGLs will consume alot less memory for the usual case > of THPs backing the VFIO registrations. > > ib_umem_get() has some example of how to code this, I've been thinking > we could make this some common API, and it could be further optimized. Agreed, great suggestions, both above and in the rest of your response. >> Yesterday I tried optimizing vfio to skip gup calls for tail pages after >> Matthew pointed out this same issue to me by coincidence last week. > > Please don't just hack up vfio like this. Yeah, you've cured me of that idea. I'll see where I get experimenting with some of this stuff.