On 5/5/22 08:42, Tian, Kevin wrote: >> From: Jason Gunthorpe <jgg@xxxxxxxxxx> >> Sent: Tuesday, May 3, 2022 2:53 AM >> >> On Mon, May 02, 2022 at 12:11:07PM -0600, Alex Williamson wrote: >>> On Fri, 29 Apr 2022 05:45:20 +0000 >>> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: >>>>> From: Joao Martins <joao.m.martins@xxxxxxxxxx> >>>>> 3) Unmapping an IOVA range while returning its dirty bit prior to >>>>> unmap. This case is specific for non-nested vIOMMU case where an >>>>> erronous guest (or device) DMAing to an address being unmapped at >> the >>>>> same time. >>>> >>>> an erroneous attempt like above cannot anticipate which DMAs can >>>> succeed in that window thus the end behavior is undefined. For an >>>> undefined behavior nothing will be broken by losing some bits dirtied >>>> in the window between reading back dirty bits of the range and >>>> actually calling unmap. From guest p.o.v. all those are black-box >>>> hardware logic to serve a virtual iotlb invalidation request which just >>>> cannot be completed in one cycle. >>>> >>>> Hence in reality probably this is not required except to meet vfio >>>> compat requirement. Just in concept returning dirty bits at unmap >>>> is more accurate. >>>> >>>> I'm slightly inclined to abandon it in iommufd uAPI. >>> >>> Sorry, I'm not following why an unmap with returned dirty bitmap >>> operation is specific to a vIOMMU case, or in fact indicative of some >>> sort of erroneous, racy behavior of guest or device. >> >> It is being compared against the alternative which is to explicitly >> query dirty then do a normal unmap as two system calls and permit a >> race. >> >> The only case with any difference is if the guest is racing DMA with >> the unmap - in which case it is already indeterminate for the guest if >> the DMA will be completed or not. >> >> eg on the vIOMMU case if the guest races DMA with unmap then we are >> already fine with throwing away that DMA because that is how the race >> resolves during non-migration situations, so resovling it as throwing >> away the DMA during migration is OK too. >> >>> We need the flexibility to support memory hot-unplug operations >>> during migration, >> >> I would have thought that hotplug during migration would simply >> discard all the data - how does it use the dirty bitmap? >> >>> This was implemented as a single operation specifically to avoid >>> races where ongoing access may be available after retrieving a >>> snapshot of the bitmap. Thanks, >> >> The issue is the cost. >> >> On a real iommu elminating the race is expensive as we have to write >> protect the pages before query dirty, which seems to be an extra IOTLB >> flush. >> >> It is not clear if paying this cost to become atomic is actually >> something any use case needs. >> >> So, I suggest we think about a 3rd op 'write protect and clear >> dirties' that will be followed by a normal unmap - the extra op will >> have the extra oveheard and userspace can decide if it wants to pay or >> not vs the non-atomic read dirties operation. And lets have a use case >> where this must be atomic before we implement it.. > > and write-protection also relies on the support of I/O page fault... > /I think/ all IOMMUs in this series already support permission/unrecoverable I/O page faults for a long time IIUC. The earlier suggestion was just to discard the I/O page fault after write-protection happens. fwiw, some IOMMUs also support suppressing the event notification (like AMD).