> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Tuesday, May 3, 2022 2:53 AM > > On Mon, May 02, 2022 at 12:11:07PM -0600, Alex Williamson wrote: > > On Fri, 29 Apr 2022 05:45:20 +0000 > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > From: Joao Martins <joao.m.martins@xxxxxxxxxx> > > > > 3) Unmapping an IOVA range while returning its dirty bit prior to > > > > unmap. This case is specific for non-nested vIOMMU case where an > > > > erronous guest (or device) DMAing to an address being unmapped at > the > > > > same time. > > > > > > an erroneous attempt like above cannot anticipate which DMAs can > > > succeed in that window thus the end behavior is undefined. For an > > > undefined behavior nothing will be broken by losing some bits dirtied > > > in the window between reading back dirty bits of the range and > > > actually calling unmap. From guest p.o.v. all those are black-box > > > hardware logic to serve a virtual iotlb invalidation request which just > > > cannot be completed in one cycle. > > > > > > Hence in reality probably this is not required except to meet vfio > > > compat requirement. Just in concept returning dirty bits at unmap > > > is more accurate. > > > > > > I'm slightly inclined to abandon it in iommufd uAPI. > > > > Sorry, I'm not following why an unmap with returned dirty bitmap > > operation is specific to a vIOMMU case, or in fact indicative of some > > sort of erroneous, racy behavior of guest or device. > > It is being compared against the alternative which is to explicitly > query dirty then do a normal unmap as two system calls and permit a > race. > > The only case with any difference is if the guest is racing DMA with > the unmap - in which case it is already indeterminate for the guest if > the DMA will be completed or not. > > eg on the vIOMMU case if the guest races DMA with unmap then we are > already fine with throwing away that DMA because that is how the race > resolves during non-migration situations, so resovling it as throwing > away the DMA during migration is OK too. > > > We need the flexibility to support memory hot-unplug operations > > during migration, > > I would have thought that hotplug during migration would simply > discard all the data - how does it use the dirty bitmap? > > > This was implemented as a single operation specifically to avoid > > races where ongoing access may be available after retrieving a > > snapshot of the bitmap. Thanks, > > The issue is the cost. > > On a real iommu elminating the race is expensive as we have to write > protect the pages before query dirty, which seems to be an extra IOTLB > flush. > > It is not clear if paying this cost to become atomic is actually > something any use case needs. > > So, I suggest we think about a 3rd op 'write protect and clear > dirties' that will be followed by a normal unmap - the extra op will > have the extra oveheard and userspace can decide if it wants to pay or > not vs the non-atomic read dirties operation. And lets have a use case > where this must be atomic before we implement it.. and write-protection also relies on the support of I/O page fault... > > The downside is we loose a little bit of efficiency by unbundling > these steps, the upside is that it doesn't require quite as many > special iommu_domain/etc paths. > > (Also Joao, you should probably have a read and do not clear dirty > operation with the idea that the next operation will be unmap - then > maybe we can avoid IOTLB flushing..) > > Jason