On Thu, 2019-08-29 at 08:52:48 UTC, Alexey Kardashevskiy wrote: > At the moment updates in a TCE table are made by iommu_table_ops::exchange > which update one TCE and invalidates an entry in the PHB/NPU TCE cache > via set of registers called "TCE Kill" (hence the naming). > Writing a TCE is a simple xchg() but invalidating the TCE cache is > a relatively expensive OPAL call. Mapping a 100GB guest with PCI+NPU > passed through devices takes about 20s. > > Thankfully we can do better. Since such big mappings happen at the boot > time and when memory is plugged/onlined (i.e. not often), these requests > come in 512 pages so we call call OPAL 512 times less which brings 20s > from the above to less than 10s. Also, since TCE caches can be flushed > entirely, calling OPAL for 512 TCEs helps skiboot [1] to decide whether > to flush the entire cache or not. > > This implements 2 new iommu_table_ops callbacks: > - xchg_no_kill() to update a single TCE with no TCE invalidation; > - tce_kill() to invalidate multiple TCEs. > This uses the same xchg_no_kill() callback for IODA1/2. > > This implements 2 new wrappers on top of the new callbacks similar to > the existing iommu_tce_xchg(). > > This does not use the new callbacks yet, the next patches will; > so this should not cause any behavioral change. > > Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> Series applied to powerpc topic/ppc-kvm, thanks. https://git.kernel.org/powerpc/c/35872480da47ec714fd9c4f2f3d2d83daf304851 cheers