On Sun, 1 May 2022 15:33:00 +0300 Yishai Hadas <yishaih@xxxxxxxxxx> wrote: > DMA logging allows a device to internally record what DMAs the device is > initiation and report them back to userspace. > > It is part of the VFIO migration infrastructure that allows implementing > dirty page tracking during the pre-copy phase of live migration. > > Only DMA WRITEs are logged, and this API is not connected to > VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE. > > This RFC patch shows the expected usage of the DMA logging involved > uAPIs for VFIO device-tracker. > > It uses the FEATURE ioctl with its GET/SET/PROBE options as of below. > > It exposes a PROBE option to detect if the device supports DMA logging. > > It exposes a SET option to start device DMA logging in given of IOVA > ranges. > > It exposes a SET option to stop device DMA logging that was previously > started. > > It exposes a GET option to read back and clear the device DMA log. > > Extra details exist as part of vfio.h per a specific option in this RFC > patch. > > Note: > To have IOMMU hardware support for dirty pages the below RFC [1] that > was sent by Joao Martins can be referenced. > > [1] https://lore.kernel.org/all/2d369e58-8ac0-f263-7b94-fe73917782e1@xxxxxxxxxxxxxxx/T/ > > Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxx> > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > --- > include/uapi/linux/vfio.h | 80 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 80 insertions(+) > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index fea86061b44e..9d0b7e73e999 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -986,6 +986,86 @@ enum vfio_device_mig_state { > VFIO_DEVICE_STATE_RUNNING_P2P = 5, > }; > > +/* > + * Upon VFIO_DEVICE_FEATURE_SET start device DMA logging. > + * VFIO_DEVICE_FEATURE_PROBE can be used to detect if the device supports > + * DMA logging. > + * > + * DMA logging allows a device to internally record what DMAs the device is > + * initiation and report them back to userspace. It is part of the VFIO > + * migration infrastructure that allows implementing dirty page tracking > + * during the pre copy phase of live migration. Only DMA WRITEs are logged, > + * and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE. > + * > + * When DMA logging is started a range of IOVAs to monitor is provided and the > + * device can optimize its logging to cover only the IOVA range given. Each > + * DMA that the device initiates inside the range will be logged by the device > + * for later retrieval. > + * > + * page_size is an input that hints what tracking granularity the device > + * should try to achieve. If the device cannot do the hinted page size then it > + * should pick the next closest page size it supports. On output the device > + * will return the page size it selected. > + * > + * ranges is a pointer to an array of > + * struct vfio_device_feature_dma_logging_range. > + */ > +struct vfio_device_feature_dma_logging_control { > + __aligned_u64 page_size; > + __u32 num_ranges; > + __u32 __reserved; > + __aligned_u64 ranges; > +}; > + > +struct vfio_device_feature_dma_logging_range { > + __aligned_u64 iova; > + __aligned_u64 length; > +}; > + > +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_START 3 > + > + > +/* > + * Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started > + * by VFIO_DEVICE_FEATURE_DMA_LOGGING_START > + */ > +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 4 This seems difficult to use from a QEMU perspective, where a vfio device typically operates on a MemoryListener and we only have visibility to one range at a time. I don't see any indication that LOGGING_START is meant to be cumulative such that userspace could incrementally add ranges to be watched, nor clearly does LOGGING_STOP appear to have any sort of IOVA range granularity. Is userspace intended to pass the full vCPU physical address range here, and if so would a single min/max IOVA be sufficient? I'm not sure how else we could support memory hotplug while this was enabled. How does this work with IOMMU based tracking, I assume that if devices share an IOAS we wouldn't be able to exclude devices supporting device-level tracking from the IOAS log. > + > +/* > + * Upon VFIO_DEVICE_FEATURE_GET read back and clear the device DMA log > + * > + * Query the device's DMA log for written pages within the given IOVA range. > + * During querying the log is cleared for the IOVA range. > + * > + * bitmap is a pointer to an array of u64s that will hold the output bitmap > + * with 1 bit reporting a page_size unit of IOVA. The mapping of IOVA to bits > + * is given by: > + * bitmap[(addr - iova)/page_size] & (1ULL << (addr % 64)) > + * > + * The input page_size can be any power of two value and does not have to > + * match the value given to VFIO_DEVICE_FEATURE_DMA_LOGGING_START. The driver > + * will format its internal logging to match the reporting page size, possibly > + * by replicating bits if the internal page size is lower than requested. Or setting multiple bits if the internal page size is larger than requested. Is there a bitmap size limit? We've minimally needed to impose limits to reflect limitations of the bitmap code internally in the past. Userspace needs a means to learn such limits. Thanks, Alex > + * > + * Bits will be updated in bitmap using atomic or to allow userspace to > + * combine bitmaps from multiple trackers together. Therefore userspace must > + * zero the bitmap before doing any reports. > + * > + * If any error is returned userspace should assume that the dirty log is > + * corrupted and restart. > + * > + * If DMA logging is not enabled, an error will be returned. > + * > + */ > +struct vfio_device_feature_dma_logging_report { > + __aligned_u64 iova; > + __aligned_u64 length; > + __aligned_u64 page_size; > + __aligned_u64 bitmap; > +}; > + > +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT 5 > + > /* -------- API for Type1 VFIO IOMMU -------- */ > > /**