Re: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/29/22 13:38, Jason Gunthorpe wrote:
> On Fri, Apr 29, 2022 at 11:27:58AM +0100, Joao Martins wrote:
>>>>  3) Unmapping an IOVA range while returning its dirty bit prior to
>>>> unmap. This case is specific for non-nested vIOMMU case where an
>>>> erronous guest (or device) DMAing to an address being unmapped at the
>>>> same time.
>>>
>>> an erroneous attempt like above cannot anticipate which DMAs can
>>> succeed in that window thus the end behavior is undefined. For an
>>> undefined behavior nothing will be broken by losing some bits dirtied
>>> in the window between reading back dirty bits of the range and
>>> actually calling unmap. From guest p.o.v. all those are black-box
>>> hardware logic to serve a virtual iotlb invalidation request which just
>>> cannot be completed in one cycle.
>>>
>>> Hence in reality probably this is not required except to meet vfio
>>> compat requirement. Just in concept returning dirty bits at unmap
>>> is more accurate.
>>>
>>> I'm slightly inclined to abandon it in iommufd uAPI.
>>
>> OK, it seems I am not far off from your thoughts.
>>
>> I'll see what others think too, and if so I'll remove the unmap_dirty.
>>
>> Because if vfio-compat doesn't get the iommu hw dirty support, then there would
>> be no users of unmap_dirty.
> 
> I'm inclined to agree with Kevin.
> 
> If the VM does do a rouge DMA while unmapping its vIOMMU then already
> it will randomly get or loose that DMA. Adding the dirty tracking race
> during live migration just further bias's that randomness toward
> loose.  Since we don't relay protection faults to the guest there is
> no guest observable difference, IMHO.
> 
Hmm, we don't /yet/. I don't know if that is going to change at some point.

We do propagate MCEs for example (and AER?). And I suppose with nesting
IO-page-faults will be propagated. Albeit it is a different thing of this
problem above.

Albeit even if we do, after the unmap-and-read-dirty induced IO page faults
ought to not be propagated to the guest.

> In any case, I don't think the implementation here for unmap_dirty is
> race free?  So, if we are doing all this complexity just to make the
> race smaller, I don't see the point.
> 
+1

> To make it race free I think you have to write protect the IOPTE then
> synchronize the IOTLB, read back the dirty, then unmap and synchronize
> the IOTLB again. 

That would indeed fully close the race with the IOTLB. But damn, it would
be expensive.

> That has such a high performance cost I'm not
> convinced it is worthwhile - and if it has to be two step like this
> then it would be cleaner to introduce a 'writeprotect and read dirty'
> op instead of overloading unmap. 

I can switch to that kind of primitive, should the group deem this as
necessary. But it feels like we are more leaning towards a no.

> We don't need to microoptimize away
> the extra io page table walk when we are already doing two
> invalidations in the overhead..
> 
IIUC fully closing the race as above might be incompatible with SMMUv3
provided that we need to clear the DBM (or CD.HD) to mark the IOPTEs
from writeable-clean to read-only, but then the dirty bit loses its
meaning. Oh wait, unless it's just rather than comparing writeable-clean
we clear DBM and then just check if the PTE was RO or RW to determine
dirty (provided we discard any IO PAGE faults happening between wrprotect
and read-dirty)

>>>> * There's no capabilities API in IOMMUFD, and in this RFC each vendor tracks
>>>
>>> there was discussion adding device capability uAPI somewhere.
>>>
>> ack let me know if there was snippets to the conversation as I seem to have missed that.
> 
> It was just discssion pending something we actually needed to report.
> 
> Would be a very simple ioctl taking in the device ID and fulling a
> struct of stuff.
>  
Yeap.

>>> probably this can be reported as a device cap as supporting of dirty bit is
>>> an immutable property of the iommu serving that device. 
> 
> It is an easier fit to read it out of the iommu_domain after device
> attach though - since we don't need to build new kernel infrastructure
> to query it from a device.
>  
That would be more like working on a hwpt_id instead of a device_id for that
previously mentioned ioctl. Something like IOMMUFD_CHECK_EXTENSION

Which receives a capability nr (or additionally hwpt_id) and returns a struct of
something. That is more future proof towards new kinds of stuff e.g. fetching the
whole domain hardware capabilities available in the platform (or device when passed a
hwpt_id), platform reserved ranges (like the HT hole that AMD systems have, or
the 4G hole in x86). Right now it is all buried in sysfs, or sometimes in sysfs but
specific to the device, even though some of that info is orthogonal to the device.

>>> Userspace can
>>> enable dirty tracking on a hwpt if all attached devices claim the support
>>> and kernel will does the same verification.
>>
>> Sorry to be dense but this is not up to 'devices' given they take no
>> part in the tracking?  I guess by 'devices' you mean the software
>> idea of it i.e. the iommu context created for attaching a said
>> physical device, not the physical device itself.
> 
> Indeed, an hwpt represents an iommu_domain and if the iommu_domain has
> dirty tracking ops set then that is an inherent propery of the domain
> and does not suddenly go away when a new device is attached.
>  
> Jason



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux