Re: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/5/22 12:03, Tian, Kevin wrote:
>> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
>> Sent: Thursday, May 5, 2022 6:07 PM
>>
>> On 5/5/22 08:42, Tian, Kevin wrote:
>>>> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
>>>> Sent: Tuesday, May 3, 2022 2:53 AM
>>>>
>>>> On Mon, May 02, 2022 at 12:11:07PM -0600, Alex Williamson wrote:
>>>>> On Fri, 29 Apr 2022 05:45:20 +0000
>>>>> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>>>>>>> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
>>>>>>>  3) Unmapping an IOVA range while returning its dirty bit prior to
>>>>>>> unmap. This case is specific for non-nested vIOMMU case where an
>>>>>>> erronous guest (or device) DMAing to an address being unmapped at
>>>> the
>>>>>>> same time.
>>>>>>
>>>>>> an erroneous attempt like above cannot anticipate which DMAs can
>>>>>> succeed in that window thus the end behavior is undefined. For an
>>>>>> undefined behavior nothing will be broken by losing some bits dirtied
>>>>>> in the window between reading back dirty bits of the range and
>>>>>> actually calling unmap. From guest p.o.v. all those are black-box
>>>>>> hardware logic to serve a virtual iotlb invalidation request which just
>>>>>> cannot be completed in one cycle.
>>>>>>
>>>>>> Hence in reality probably this is not required except to meet vfio
>>>>>> compat requirement. Just in concept returning dirty bits at unmap
>>>>>> is more accurate.
>>>>>>
>>>>>> I'm slightly inclined to abandon it in iommufd uAPI.
>>>>>
>>>>> Sorry, I'm not following why an unmap with returned dirty bitmap
>>>>> operation is specific to a vIOMMU case, or in fact indicative of some
>>>>> sort of erroneous, racy behavior of guest or device.
>>>>
>>>> It is being compared against the alternative which is to explicitly
>>>> query dirty then do a normal unmap as two system calls and permit a
>>>> race.
>>>>
>>>> The only case with any difference is if the guest is racing DMA with
>>>> the unmap - in which case it is already indeterminate for the guest if
>>>> the DMA will be completed or not.
>>>>
>>>> eg on the vIOMMU case if the guest races DMA with unmap then we are
>>>> already fine with throwing away that DMA because that is how the race
>>>> resolves during non-migration situations, so resovling it as throwing
>>>> away the DMA during migration is OK too.
>>>>
>>>>> We need the flexibility to support memory hot-unplug operations
>>>>> during migration,
>>>>
>>>> I would have thought that hotplug during migration would simply
>>>> discard all the data - how does it use the dirty bitmap?
>>>>
>>>>> This was implemented as a single operation specifically to avoid
>>>>> races where ongoing access may be available after retrieving a
>>>>> snapshot of the bitmap.  Thanks,
>>>>
>>>> The issue is the cost.
>>>>
>>>> On a real iommu elminating the race is expensive as we have to write
>>>> protect the pages before query dirty, which seems to be an extra IOTLB
>>>> flush.
>>>>
>>>> It is not clear if paying this cost to become atomic is actually
>>>> something any use case needs.
>>>>
>>>> So, I suggest we think about a 3rd op 'write protect and clear
>>>> dirties' that will be followed by a normal unmap - the extra op will
>>>> have the extra oveheard and userspace can decide if it wants to pay or
>>>> not vs the non-atomic read dirties operation. And lets have a use case
>>>> where this must be atomic before we implement it..
>>>
>>> and write-protection also relies on the support of I/O page fault...
>>>
>> /I think/ all IOMMUs in this series already support permission/unrecoverable
>> I/O page faults for a long time IIUC.
>>
>> The earlier suggestion was just to discard the I/O page fault after
>> write-protection happens. fwiw, some IOMMUs also support suppressing
>> the event notification (like AMD).
> 
> iiuc the purpose of 'write-protection' here is to capture in-fly dirty pages
> in the said race window until unmap and iotlb is invalidated is completed.
> 
But then we depend on PRS being there on the device, because without it, DMA is
aborted on the target on a read-only IOVA prior to the page fault, thus the page
is not going to be dirty anyways.

> *unrecoverable* faults are not expected to be used in a feature path
> as occurrence of such faults may lead to severe reaction in iommu
> drivers e.g. completely block DMA from the device causing such faults.

Unless I totally misunderstood ... the later is actually what we were suggesting
here /in the context of unmaping an GIOVA/(*).

The wrprotect() was there to ensure we get an atomic dirty state of the IOVA range
afterwards, by blocking DMA (as opposed to sort of mediating DMA). The I/O page fault is
not supposed to happen unless there's rogue DMA AIUI.

TBH, the same could be said for normal DMA unmap as that does not make any sort of
guarantees of stopping DMA until the IOTLB flush happens.

(*) Although I am not saying the use-case of wrprotect() and mediating dirty pages you say
isn't useful. I guess it is in a world where we want support post-copy migration with VFs,
which would require some form of PRI (via the PF?) of the migratable VF. I was just trying
to differentiate that this in the context of unmapping an IOVA.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux