RE: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Thu, 5 May 2022 11:03:18 +0000

> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
> Sent: Thursday, May 5, 2022 6:07 PM
> 
> On 5/5/22 08:42, Tian, Kevin wrote:
> >> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> >> Sent: Tuesday, May 3, 2022 2:53 AM
> >>
> >> On Mon, May 02, 2022 at 12:11:07PM -0600, Alex Williamson wrote:
> >>> On Fri, 29 Apr 2022 05:45:20 +0000
> >>> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> >>>>> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
> >>>>>  3) Unmapping an IOVA range while returning its dirty bit prior to
> >>>>> unmap. This case is specific for non-nested vIOMMU case where an
> >>>>> erronous guest (or device) DMAing to an address being unmapped at
> >> the
> >>>>> same time.
> >>>>
> >>>> an erroneous attempt like above cannot anticipate which DMAs can
> >>>> succeed in that window thus the end behavior is undefined. For an
> >>>> undefined behavior nothing will be broken by losing some bits dirtied
> >>>> in the window between reading back dirty bits of the range and
> >>>> actually calling unmap. From guest p.o.v. all those are black-box
> >>>> hardware logic to serve a virtual iotlb invalidation request which just
> >>>> cannot be completed in one cycle.
> >>>>
> >>>> Hence in reality probably this is not required except to meet vfio
> >>>> compat requirement. Just in concept returning dirty bits at unmap
> >>>> is more accurate.
> >>>>
> >>>> I'm slightly inclined to abandon it in iommufd uAPI.
> >>>
> >>> Sorry, I'm not following why an unmap with returned dirty bitmap
> >>> operation is specific to a vIOMMU case, or in fact indicative of some
> >>> sort of erroneous, racy behavior of guest or device.
> >>
> >> It is being compared against the alternative which is to explicitly
> >> query dirty then do a normal unmap as two system calls and permit a
> >> race.
> >>
> >> The only case with any difference is if the guest is racing DMA with
> >> the unmap - in which case it is already indeterminate for the guest if
> >> the DMA will be completed or not.
> >>
> >> eg on the vIOMMU case if the guest races DMA with unmap then we are
> >> already fine with throwing away that DMA because that is how the race
> >> resolves during non-migration situations, so resovling it as throwing
> >> away the DMA during migration is OK too.
> >>
> >>> We need the flexibility to support memory hot-unplug operations
> >>> during migration,
> >>
> >> I would have thought that hotplug during migration would simply
> >> discard all the data - how does it use the dirty bitmap?
> >>
> >>> This was implemented as a single operation specifically to avoid
> >>> races where ongoing access may be available after retrieving a
> >>> snapshot of the bitmap.  Thanks,
> >>
> >> The issue is the cost.
> >>
> >> On a real iommu elminating the race is expensive as we have to write
> >> protect the pages before query dirty, which seems to be an extra IOTLB
> >> flush.
> >>
> >> It is not clear if paying this cost to become atomic is actually
> >> something any use case needs.
> >>
> >> So, I suggest we think about a 3rd op 'write protect and clear
> >> dirties' that will be followed by a normal unmap - the extra op will
> >> have the extra oveheard and userspace can decide if it wants to pay or
> >> not vs the non-atomic read dirties operation. And lets have a use case
> >> where this must be atomic before we implement it..
> >
> > and write-protection also relies on the support of I/O page fault...
> >
> /I think/ all IOMMUs in this series already support permission/unrecoverable
> I/O page faults for a long time IIUC.
> 
> The earlier suggestion was just to discard the I/O page fault after
> write-protection happens. fwiw, some IOMMUs also support suppressing
> the event notification (like AMD).

iiuc the purpose of 'write-protection' here is to capture in-fly dirty pages
in the said race window until unmap and iotlb is invalidated is completed.

*unrecoverable* faults are not expected to be used in a feature path
as occurrence of such faults may lead to severe reaction in iommu
drivers e.g. completely block DMA from the device causing such faults.