Re: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

Jason Gunthorpe <jgg@xxxxxxxxxx> · Fri, 29 Apr 2022 09:38:41 -0300

On Fri, Apr 29, 2022 at 11:27:58AM +0100, Joao Martins wrote:
> >>  3) Unmapping an IOVA range while returning its dirty bit prior to
> >> unmap. This case is specific for non-nested vIOMMU case where an
> >> erronous guest (or device) DMAing to an address being unmapped at the
> >> same time.
> > 
> > an erroneous attempt like above cannot anticipate which DMAs can
> > succeed in that window thus the end behavior is undefined. For an
> > undefined behavior nothing will be broken by losing some bits dirtied
> > in the window between reading back dirty bits of the range and
> > actually calling unmap. From guest p.o.v. all those are black-box
> > hardware logic to serve a virtual iotlb invalidation request which just
> > cannot be completed in one cycle.
> > 
> > Hence in reality probably this is not required except to meet vfio
> > compat requirement. Just in concept returning dirty bits at unmap
> > is more accurate.
> > 
> > I'm slightly inclined to abandon it in iommufd uAPI.
> 
> OK, it seems I am not far off from your thoughts.
> 
> I'll see what others think too, and if so I'll remove the unmap_dirty.
> 
> Because if vfio-compat doesn't get the iommu hw dirty support, then there would
> be no users of unmap_dirty.

I'm inclined to agree with Kevin.

If the VM does do a rouge DMA while unmapping its vIOMMU then already
it will randomly get or loose that DMA. Adding the dirty tracking race
during live migration just further bias's that randomness toward
loose.  Since we don't relay protection faults to the guest there is
no guest observable difference, IMHO.

In any case, I don't think the implementation here for unmap_dirty is
race free?  So, if we are doing all this complexity just to make the
race smaller, I don't see the point.

To make it race free I think you have to write protect the IOPTE then
synchronize the IOTLB, read back the dirty, then unmap and synchronize
the IOTLB again. That has such a high performance cost I'm not
convinced it is worthwhile - and if it has to be two step like this
then it would be cleaner to introduce a 'writeprotect and read dirty'
op instead of overloading unmap. We don't need to microoptimize away
the extra io page table walk when we are already doing two
invalidations in the overhead..

> >> * There's no capabilities API in IOMMUFD, and in this RFC each vendor tracks
> > 
> > there was discussion adding device capability uAPI somewhere.
> > 
> ack let me know if there was snippets to the conversation as I seem to have missed that.

It was just discssion pending something we actually needed to report.

Would be a very simple ioctl taking in the device ID and fulling a
struct of stuff.

> > probably this can be reported as a device cap as supporting of dirty bit is
> > an immutable property of the iommu serving that device. 

It is an easier fit to read it out of the iommu_domain after device
attach though - since we don't need to build new kernel infrastructure
to query it from a device.

> > Userspace can
> > enable dirty tracking on a hwpt if all attached devices claim the support
> > and kernel will does the same verification.
> 
> Sorry to be dense but this is not up to 'devices' given they take no
> part in the tracking?  I guess by 'devices' you mean the software
> idea of it i.e. the iommu context created for attaching a said
> physical device, not the physical device itself.

Indeed, an hwpt represents an iommu_domain and if the iommu_domain has
dirty tracking ops set then that is an inherent propery of the domain
and does not suddenly go away when a new device is attached.

Jason