RE: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
> Sent: Thursday, May 5, 2022 7:51 PM
> 
> On 5/5/22 12:03, Tian, Kevin wrote:
> >> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
> >> Sent: Thursday, May 5, 2022 6:07 PM
> >>
> >> On 5/5/22 08:42, Tian, Kevin wrote:
> >>>> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> >>>> Sent: Tuesday, May 3, 2022 2:53 AM
> >>>>
> >>>> On Mon, May 02, 2022 at 12:11:07PM -0600, Alex Williamson wrote:
> >>>>> On Fri, 29 Apr 2022 05:45:20 +0000
> >>>>> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> >>>>>>> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
> >>>>>>>  3) Unmapping an IOVA range while returning its dirty bit prior to
> >>>>>>> unmap. This case is specific for non-nested vIOMMU case where an
> >>>>>>> erronous guest (or device) DMAing to an address being unmapped
> at
> >>>> the
> >>>>>>> same time.
> >>>>>>
> >>>>>> an erroneous attempt like above cannot anticipate which DMAs can
> >>>>>> succeed in that window thus the end behavior is undefined. For an
> >>>>>> undefined behavior nothing will be broken by losing some bits dirtied
> >>>>>> in the window between reading back dirty bits of the range and
> >>>>>> actually calling unmap. From guest p.o.v. all those are black-box
> >>>>>> hardware logic to serve a virtual iotlb invalidation request which just
> >>>>>> cannot be completed in one cycle.
> >>>>>>
> >>>>>> Hence in reality probably this is not required except to meet vfio
> >>>>>> compat requirement. Just in concept returning dirty bits at unmap
> >>>>>> is more accurate.
> >>>>>>
> >>>>>> I'm slightly inclined to abandon it in iommufd uAPI.
> >>>>>
> >>>>> Sorry, I'm not following why an unmap with returned dirty bitmap
> >>>>> operation is specific to a vIOMMU case, or in fact indicative of some
> >>>>> sort of erroneous, racy behavior of guest or device.
> >>>>
> >>>> It is being compared against the alternative which is to explicitly
> >>>> query dirty then do a normal unmap as two system calls and permit a
> >>>> race.
> >>>>
> >>>> The only case with any difference is if the guest is racing DMA with
> >>>> the unmap - in which case it is already indeterminate for the guest if
> >>>> the DMA will be completed or not.
> >>>>
> >>>> eg on the vIOMMU case if the guest races DMA with unmap then we
> are
> >>>> already fine with throwing away that DMA because that is how the race
> >>>> resolves during non-migration situations, so resovling it as throwing
> >>>> away the DMA during migration is OK too.
> >>>>
> >>>>> We need the flexibility to support memory hot-unplug operations
> >>>>> during migration,
> >>>>
> >>>> I would have thought that hotplug during migration would simply
> >>>> discard all the data - how does it use the dirty bitmap?
> >>>>
> >>>>> This was implemented as a single operation specifically to avoid
> >>>>> races where ongoing access may be available after retrieving a
> >>>>> snapshot of the bitmap.  Thanks,
> >>>>
> >>>> The issue is the cost.
> >>>>
> >>>> On a real iommu elminating the race is expensive as we have to write
> >>>> protect the pages before query dirty, which seems to be an extra IOTLB
> >>>> flush.
> >>>>
> >>>> It is not clear if paying this cost to become atomic is actually
> >>>> something any use case needs.
> >>>>
> >>>> So, I suggest we think about a 3rd op 'write protect and clear
> >>>> dirties' that will be followed by a normal unmap - the extra op will
> >>>> have the extra oveheard and userspace can decide if it wants to pay or
> >>>> not vs the non-atomic read dirties operation. And lets have a use case
> >>>> where this must be atomic before we implement it..
> >>>
> >>> and write-protection also relies on the support of I/O page fault...
> >>>
> >> /I think/ all IOMMUs in this series already support
> permission/unrecoverable
> >> I/O page faults for a long time IIUC.
> >>
> >> The earlier suggestion was just to discard the I/O page fault after
> >> write-protection happens. fwiw, some IOMMUs also support suppressing
> >> the event notification (like AMD).
> >
> > iiuc the purpose of 'write-protection' here is to capture in-fly dirty pages
> > in the said race window until unmap and iotlb is invalidated is completed.
> >
> But then we depend on PRS being there on the device, because without it,
> DMA is
> aborted on the target on a read-only IOVA prior to the page fault, thus the
> page
> is not going to be dirty anyways.
> 
> > *unrecoverable* faults are not expected to be used in a feature path
> > as occurrence of such faults may lead to severe reaction in iommu
> > drivers e.g. completely block DMA from the device causing such faults.
> 
> Unless I totally misunderstood ... the later is actually what we were
> suggesting
> here /in the context of unmaping an GIOVA/(*).
> 
> The wrprotect() was there to ensure we get an atomic dirty state of the IOVA
> range
> afterwards, by blocking DMA (as opposed to sort of mediating DMA). The I/O
> page fault is
> not supposed to happen unless there's rogue DMA AIUI.

You are right. It's me misunderstanding the proposal here. 😊

> 
> TBH, the same could be said for normal DMA unmap as that does not make
> any sort of
> guarantees of stopping DMA until the IOTLB flush happens.
> 
> (*) Although I am not saying the use-case of wrprotect() and mediating dirty
> pages you say
> isn't useful. I guess it is in a world where we want support post-copy
> migration with VFs,
> which would require some form of PRI (via the PF?) of the migratable VF. I
> was just trying
> to differentiate that this in the context of unmapping an IOVA.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux