> From: Joao Martins <joao.m.martins@xxxxxxxxxx> > Sent: Thursday, May 5, 2022 6:07 PM > > On 5/5/22 08:42, Tian, Kevin wrote: > >> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > >> Sent: Tuesday, May 3, 2022 2:53 AM > >> > >> On Mon, May 02, 2022 at 12:11:07PM -0600, Alex Williamson wrote: > >>> On Fri, 29 Apr 2022 05:45:20 +0000 > >>> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > >>>>> From: Joao Martins <joao.m.martins@xxxxxxxxxx> > >>>>> 3) Unmapping an IOVA range while returning its dirty bit prior to > >>>>> unmap. This case is specific for non-nested vIOMMU case where an > >>>>> erronous guest (or device) DMAing to an address being unmapped at > >> the > >>>>> same time. > >>>> > >>>> an erroneous attempt like above cannot anticipate which DMAs can > >>>> succeed in that window thus the end behavior is undefined. For an > >>>> undefined behavior nothing will be broken by losing some bits dirtied > >>>> in the window between reading back dirty bits of the range and > >>>> actually calling unmap. From guest p.o.v. all those are black-box > >>>> hardware logic to serve a virtual iotlb invalidation request which just > >>>> cannot be completed in one cycle. > >>>> > >>>> Hence in reality probably this is not required except to meet vfio > >>>> compat requirement. Just in concept returning dirty bits at unmap > >>>> is more accurate. > >>>> > >>>> I'm slightly inclined to abandon it in iommufd uAPI. > >>> > >>> Sorry, I'm not following why an unmap with returned dirty bitmap > >>> operation is specific to a vIOMMU case, or in fact indicative of some > >>> sort of erroneous, racy behavior of guest or device. > >> > >> It is being compared against the alternative which is to explicitly > >> query dirty then do a normal unmap as two system calls and permit a > >> race. > >> > >> The only case with any difference is if the guest is racing DMA with > >> the unmap - in which case it is already indeterminate for the guest if > >> the DMA will be completed or not. > >> > >> eg on the vIOMMU case if the guest races DMA with unmap then we are > >> already fine with throwing away that DMA because that is how the race > >> resolves during non-migration situations, so resovling it as throwing > >> away the DMA during migration is OK too. > >> > >>> We need the flexibility to support memory hot-unplug operations > >>> during migration, > >> > >> I would have thought that hotplug during migration would simply > >> discard all the data - how does it use the dirty bitmap? > >> > >>> This was implemented as a single operation specifically to avoid > >>> races where ongoing access may be available after retrieving a > >>> snapshot of the bitmap. Thanks, > >> > >> The issue is the cost. > >> > >> On a real iommu elminating the race is expensive as we have to write > >> protect the pages before query dirty, which seems to be an extra IOTLB > >> flush. > >> > >> It is not clear if paying this cost to become atomic is actually > >> something any use case needs. > >> > >> So, I suggest we think about a 3rd op 'write protect and clear > >> dirties' that will be followed by a normal unmap - the extra op will > >> have the extra oveheard and userspace can decide if it wants to pay or > >> not vs the non-atomic read dirties operation. And lets have a use case > >> where this must be atomic before we implement it.. > > > > and write-protection also relies on the support of I/O page fault... > > > /I think/ all IOMMUs in this series already support permission/unrecoverable > I/O page faults for a long time IIUC. > > The earlier suggestion was just to discard the I/O page fault after > write-protection happens. fwiw, some IOMMUs also support suppressing > the event notification (like AMD). iiuc the purpose of 'write-protection' here is to capture in-fly dirty pages in the said race window until unmap and iotlb is invalidated is completed. *unrecoverable* faults are not expected to be used in a feature path as occurrence of such faults may lead to severe reaction in iommu drivers e.g. completely block DMA from the device causing such faults.