> From: Joao Martins <joao.m.martins@xxxxxxxxxx> > Sent: Thursday, May 5, 2022 7:51 PM > > On 5/5/22 12:03, Tian, Kevin wrote: > >> From: Joao Martins <joao.m.martins@xxxxxxxxxx> > >> Sent: Thursday, May 5, 2022 6:07 PM > >> > >> On 5/5/22 08:42, Tian, Kevin wrote: > >>>> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > >>>> Sent: Tuesday, May 3, 2022 2:53 AM > >>>> > >>>> On Mon, May 02, 2022 at 12:11:07PM -0600, Alex Williamson wrote: > >>>>> On Fri, 29 Apr 2022 05:45:20 +0000 > >>>>> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > >>>>>>> From: Joao Martins <joao.m.martins@xxxxxxxxxx> > >>>>>>> 3) Unmapping an IOVA range while returning its dirty bit prior to > >>>>>>> unmap. This case is specific for non-nested vIOMMU case where an > >>>>>>> erronous guest (or device) DMAing to an address being unmapped > at > >>>> the > >>>>>>> same time. > >>>>>> > >>>>>> an erroneous attempt like above cannot anticipate which DMAs can > >>>>>> succeed in that window thus the end behavior is undefined. For an > >>>>>> undefined behavior nothing will be broken by losing some bits dirtied > >>>>>> in the window between reading back dirty bits of the range and > >>>>>> actually calling unmap. From guest p.o.v. all those are black-box > >>>>>> hardware logic to serve a virtual iotlb invalidation request which just > >>>>>> cannot be completed in one cycle. > >>>>>> > >>>>>> Hence in reality probably this is not required except to meet vfio > >>>>>> compat requirement. Just in concept returning dirty bits at unmap > >>>>>> is more accurate. > >>>>>> > >>>>>> I'm slightly inclined to abandon it in iommufd uAPI. > >>>>> > >>>>> Sorry, I'm not following why an unmap with returned dirty bitmap > >>>>> operation is specific to a vIOMMU case, or in fact indicative of some > >>>>> sort of erroneous, racy behavior of guest or device. > >>>> > >>>> It is being compared against the alternative which is to explicitly > >>>> query dirty then do a normal unmap as two system calls and permit a > >>>> race. > >>>> > >>>> The only case with any difference is if the guest is racing DMA with > >>>> the unmap - in which case it is already indeterminate for the guest if > >>>> the DMA will be completed or not. > >>>> > >>>> eg on the vIOMMU case if the guest races DMA with unmap then we > are > >>>> already fine with throwing away that DMA because that is how the race > >>>> resolves during non-migration situations, so resovling it as throwing > >>>> away the DMA during migration is OK too. > >>>> > >>>>> We need the flexibility to support memory hot-unplug operations > >>>>> during migration, > >>>> > >>>> I would have thought that hotplug during migration would simply > >>>> discard all the data - how does it use the dirty bitmap? > >>>> > >>>>> This was implemented as a single operation specifically to avoid > >>>>> races where ongoing access may be available after retrieving a > >>>>> snapshot of the bitmap. Thanks, > >>>> > >>>> The issue is the cost. > >>>> > >>>> On a real iommu elminating the race is expensive as we have to write > >>>> protect the pages before query dirty, which seems to be an extra IOTLB > >>>> flush. > >>>> > >>>> It is not clear if paying this cost to become atomic is actually > >>>> something any use case needs. > >>>> > >>>> So, I suggest we think about a 3rd op 'write protect and clear > >>>> dirties' that will be followed by a normal unmap - the extra op will > >>>> have the extra oveheard and userspace can decide if it wants to pay or > >>>> not vs the non-atomic read dirties operation. And lets have a use case > >>>> where this must be atomic before we implement it.. > >>> > >>> and write-protection also relies on the support of I/O page fault... > >>> > >> /I think/ all IOMMUs in this series already support > permission/unrecoverable > >> I/O page faults for a long time IIUC. > >> > >> The earlier suggestion was just to discard the I/O page fault after > >> write-protection happens. fwiw, some IOMMUs also support suppressing > >> the event notification (like AMD). > > > > iiuc the purpose of 'write-protection' here is to capture in-fly dirty pages > > in the said race window until unmap and iotlb is invalidated is completed. > > > But then we depend on PRS being there on the device, because without it, > DMA is > aborted on the target on a read-only IOVA prior to the page fault, thus the > page > is not going to be dirty anyways. > > > *unrecoverable* faults are not expected to be used in a feature path > > as occurrence of such faults may lead to severe reaction in iommu > > drivers e.g. completely block DMA from the device causing such faults. > > Unless I totally misunderstood ... the later is actually what we were > suggesting > here /in the context of unmaping an GIOVA/(*). > > The wrprotect() was there to ensure we get an atomic dirty state of the IOVA > range > afterwards, by blocking DMA (as opposed to sort of mediating DMA). The I/O > page fault is > not supposed to happen unless there's rogue DMA AIUI. You are right. It's me misunderstanding the proposal here. 😊 > > TBH, the same could be said for normal DMA unmap as that does not make > any sort of > guarantees of stopping DMA until the IOTLB flush happens. > > (*) Although I am not saying the use-case of wrprotect() and mediating dirty > pages you say > isn't useful. I guess it is in a world where we want support post-copy > migration with VFs, > which would require some form of PRI (via the PF?) of the migratable VF. I > was just trying > to differentiate that this in the context of unmapping an IOVA.