On Mon, Oct 15, 2018 at 09:08:41PM +1100, Alexey Kardashevskiy wrote: > The powernv platform maintains 2 TCE tables for VFIO - a hardware TCE > table and a table with userspace addresses. These tables are radix trees, > we allocate indirect levels when they are written to. Since > the memory allocation is problematic in real mode, we have 2 accessors > to the entries: > - for virtual mode: it allocates the memory and it is always expected > to return non-NULL; > - fr real mode: it does not allocate and can return NULL. > > Also, DMA windows can span to up to 55 bits of the address space and since > we never have this much RAM, such windows are sparse. However currently > the SPAPR TCE IOMMU driver walks through all TCEs to unpin DMA memory. > > Since we maintain a userspace addresses table for VFIO which is a mirror > of the hardware table, we can use it to know which parts of the DMA > window have not been mapped and skip these so does this patch. > > The bare metal systems do not have this problem as they use a bypass mode > of a PHB which maps RAM directly. > > This helps a lot with sparse DMA windows, reducing the shutdown time from > about 3 minutes per 1 billion TCEs to a few seconds for 32GB sparse guest. > Just skipping the last level seems to be good enough. > > As non-allocating accessor is used now in virtual mode as well, rename it > from IOMMU_TABLE_USERSPACE_ENTRY_RM (real mode) to _RO (read only). > > Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> Thanks, applied to my kvm-ppc-next branch, and now in the kvm next branch also. Paul.