> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Saturday, February 11, 2023 8:45 AM > > On Fri, Feb 10, 2023 at 04:51:10PM -0700, Alex Williamson wrote: > > On Tue, 7 Feb 2023 13:17:54 -0800 > > Nicolin Chen <nicolinc@xxxxxxxxxx> wrote: > > > > > qemu has a need to replace the translations associated with a domain > > > when the guest does large-scale operations like switching between an > > > IDENTITY domain and, say, dma-iommu.c. > > > > > > Currently, it does this by replacing all the mappings in a single > > > domain, but this is very inefficient and means that domains have to be > > > per-device rather than per-translation. > > > > > > Provide a high-level API to allow replacements of one domain with > > > another. This is similar to a detach/attach cycle except it doesn't > > > force the group to go to the blocking domain in-between. > > > > > > By removing this forced blocking domain the iommu driver has the > > > opportunity to implement an atomic replacement of the domains to the > > > greatest extent its hardware allows. > > > > > > It could be possible to adderss this by simply removing the protection > > > from the iommu_attach_group(), but it is not so clear if that is safe > > > for the few users. Thus, add a new API to serve this new purpose. > > > > > > Atomic replacement allows the qemu emulation of the viommu to be > more > > > complete, as real hardware has this ability. > > > > I was under the impression that we could not atomically switch a > > device's domain relative to in-flight DMA. it's possible as long as the mappings for in-flight DMA don't change in the transition. > > Certainly all the HW can be proper atomic but not necessarily easily - > the usual issue is a SW complication to manage the software controlled > cache tags in a way that doesn't corrupt the cache. > > This is because the cache tag and the io page table top are in > different 64 bit words so atomic writes don't cover both, and thus the > IOMMU HW could tear the two stores and mismatch the cache tag to the > table top. This would corrupt the cache. VT-d spec recommends using 128bit cmpxchg instruction to update page table pointer and DID together. > > The easiest way to avoid this is for SW to use the same DID for the > new and old tables. This is possible if this translation entry is the > only user of the DID. A more complex way would use a temporary DID > that can be safely corrupted. But realistically I'd expect VT-d > drivers to simply make the PASID invalid for the duration of the > update. > > However something like AMD has a single cache tag for every entry in > the PASID table so you could do an atomic replace trivially. Just > update the PASID and invalidate the caches. > > ARM has a flexible PASID table and atomic replace would be part of > resizing it. eg you can atomically update the top of the PASID table > with a single 64 bit store. > > So replace lets the drivers implement those special behaviors if it > makes sense for them. > > > Or maybe atomic is the wrong word here since we expect no in-flight DMA > > during the sort of mode transitions referred to here, and we're really > > just trying to convey that we can do this via a single operation with > > reduced latency? Thanks, > > atomic means DMA will either translate with the old domain or the new > domain but never a blocking domain. Keep in mind that with nesting > "domain" can mean a full PASID table in guest memory. > > I should reiterate that replace is not an API that is required to be > atomic. > yes. Just as explained in the commit message: " By removing this forced blocking domain the iommu driver has the opportunity to implement an atomic replacement of the domains to the greatest extent its hardware allows. " API level replace only implies removing transition to/from blocking domain. and then it's driver's call whether it wants to take this chance to implement a true 'atomic' behavior.