On 02/12/2022 10:48, Tian, Kevin wrote:
From: Yishai Hadas <yishaih@xxxxxxxxxx>
Sent: Thursday, December 1, 2022 11:29 PM
+/**
+ * VFIO_MIG_GET_PRECOPY_INFO - _IO(VFIO_TYPE, VFIO_BASE + 21)
+ *
+ * This ioctl is used on the migration data FD in the precopy phase of the
+ * migration data transfer. It returns an estimate of the current data sizes
+ * remaining to be transferred. It allows the user to judge when it is
+ * appropriate to leave PRE_COPY for STOP_COPY.
+ *
+ * This ioctl is valid only in PRE_COPY states and kernel driver should
+ * return -EINVAL from any other migration state.
+ *
+ * The vfio_precopy_info data structure returned by this ioctl provides
+ * estimates of data available from the device during the PRE_COPY states.
+ * This estimate is split into two categories, initial_bytes and
+ * dirty_bytes.
+ *
+ * The initial_bytes field indicates the amount of initial precopy
+ * data available from the device. This field should have a non-zero initial
+ * value and decrease as migration data is read from the device.
+ * It is recommended to leave PRE_COPY for STOP_COPY only after this field
+ * reaches zero. Leaving PRE_COPY earlier might make things slower.
'slower' because partially transferred initial state is wasted and a full
state transfer is still required in STOP_COPY?
Not only, 'the initial_bytes' can serve any driver for its specific
needs to reduce downtime.
For example, mlx5 passes by that some metadata about the state that
allows the target to be prepared for during STOP_COPY.
This data can be used by the FW to allocate host pages pre-ahead,
reorganize its internal data structure accordingly, etc.
Leaving PRE_COPY to STOP_COPY earlier might not give the target the
chance to enjoy from that information and things might be slower as part
of STOP_COPY.
+ *
+ * The dirty_bytes field tracks device state changes relative to data
+ * previously retrieved. This field starts at zero and may increase as
+ * the internal device state is modified or decrease as that modified
+ * state is read from the device.
+ *
+ * Userspace may use the combination of these fields to estimate the
+ * potential data size available during the PRE_COPY phases, as well as
+ * trends relative to the rate the device is dirtying its internal
+ * state, but these fields are not required to have any bearing relative
+ * to the data size available during the STOP_COPY phase.
I didn't get what the last sentence is trying to say. By definition those
fields have nothing to do with the transferred data in STOP_COPY.
is there an example what a silly driver might do w/o this caveat?
It comes to say that user space can't assume anything about the size of
the trailing STOP_COPY data set, this is why it's part of the UAPI header.
I believe that better keep it, as it clarifies things and prevent any
mistake.
Except above this looks good to me:
Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>
Thanks Kevin, will add your Reviewed-by as part of V3.
Yishai