> -----Original Message----- > From: kvm-owner@xxxxxxxxxxxxxxx [mailto:kvm-owner@xxxxxxxxxxxxxxx] On > Behalf Of Kirti Wankhede > Sent: Thursday, October 18, 2018 4:47 AM > To: Alex Williamson <alex.williamson@xxxxxxxxxx> > Cc: cjia@xxxxxxxxxx; qemu-devel@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Yulei > Zhang <yulei.zhang@xxxxxxxxx>; Dr. David Alan Gilbert > <dgilbert@xxxxxxxxxx>; Juan Quintela <quintela@xxxxxxxxxx>; Wang, Zhi A > <zhi.a.wang@xxxxxxxxx> > Subject: Re: [RFC PATCH v1 1/4] VFIO KABI for migration interface > > > On 10/17/2018 4:04 AM, Alex Williamson wrote: > > On Tue, 16 Oct 2018 23:42:35 +0530 > > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > > > >> - Added vfio_device_migration_info structure to use interact with vendor > >> driver. > >> - Different flags are used to get or set migration related information > >> from/to vendor driver. > >> Flag VFIO_MIGRATION_PROBE: To query if feature is supported > >> Flag VFIO_MIGRATION_GET_REGION: To get migration region info > >> Flag VFIO_MIGRATION_SET_STATE: To convey device state in vendor driver > >> Flag VFIO_MIGRATION_GET_PENDING: To get pending bytes yet to be > migrated > >> from vendor driver > >> Flag VFIO_MIGRATION_GET_BUFFER: On this flag, vendor driver should > write > >> data to migration region and return number of bytes written in the > region > >> Flag VFIO_MIGRATION_SET_BUFFER: In migration resume path, user space > app > >> writes to migration region and communicates it to vendor driver with > >> this ioctl with this flag. > >> Flag VFIO_MIGRATION_GET_DIRTY_PFNS: Get bitmap of dirty pages from > vendor > >> driver from given start address > >> > >> - Added enum for possible device states. > >> > >> Signed-off-by: Kirti Wankhede <kwankhede@xxxxxxxxxx> > >> Reviewed-by: Neo Jia <cjia@xxxxxxxxxx> > >> --- > >> linux-headers/linux/vfio.h | 91 > ++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 91 insertions(+) > >> > >> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h > >> index 3615a269d378..8e9045ed9aa8 100644 > >> --- a/linux-headers/linux/vfio.h > >> +++ b/linux-headers/linux/vfio.h > >> @@ -602,6 +602,97 @@ struct vfio_device_ioeventfd { > >> > >> #define VFIO_DEVICE_IOEVENTFD _IO(VFIO_TYPE, VFIO_BASE + > 16) > >> > >> +/** > >> + * VFIO_DEVICE_MIGRATION_INFO - _IOW(VFIO_TYPE, VFIO_BASE + 17, > >> + * struct vfio_device_migration_info) > > > > This is quite a bit more than an "INFO" ioctl. > > > >> + * Flag VFIO_MIGRATION_PROBE: > >> + * To query if feature is supported > >> + * > >> + * Flag VFIO_MIGRATION_GET_REGION: > >> + * To get migration region info > >> + * region_index [output] : region index to be used for migration > region > >> + * size [output]: size of migration region > > > > Of course the region migration region can describe itself as being used > > for migration, so this is unnecessary. The presence of that region > > could also negate the need for a probe. > > > > Yes, that can be done. > > > >> + * > >> + * Flag VFIO_MIGRATION_SET_STATE: > >> + * To set device state in vendor driver > >> + * device_state [input] : User space app sends device state to > vendor > >> + * driver on state change > > > > Valid states are the enum defined below, correct? > > > > Yes, that's correct. > > > Does setting STOPNCOPY_ACTIVE stop any state change of the device or is > > that expected to happen through other means? > > > > _PRECOPY_ACTIVE means vCPUs are still running, so VFIO device should > still remain active. > _STOPNCOPY_ACTIVE means vCPUs are not running and device should also be > stopped and copy device's state. > > > What are the allowable state transitions? > > > > Normal VM running case: > _NONE -> _RUNNING > > In case of live migration, at source: > _RUNNING -> _SETUP -> _PRECOPY_ACTIVE -> _STOPNCOPY_ACTIVE -> > _SAVE_COMPLETED > > at destination: > _NONE -> _SETUP -> _RESUME -> _RESUME_COMPLETE -> _RUNNING > > In save VM case: > _RUNNING -> _SETUP -> _STOPNCOPY_ACTIVE -> _SAVE_COMPLETED > > In case of resuming VM from saved state: > _NONE -> _SETUP -> _RESUME -> _RESUME_COMPLETE -> _RUNNING > > _FAILED or _CANCELLED can happen in any state. > > > How many bits in flags is a user allowed to set at once? > > > > One bit at a time. Probably, I should use enum for flags rather than bits. > > >> + * Flag VFIO_MIGRATION_GET_PENDING: > >> + * To get pending bytes yet to be migrated from vendor driver > >> + * threshold_size [Input] : threshold of buffer in User space app. > >> + * pending_precopy_only [output] : pending data which must be > migrated in > >> + * precopy phase or in stopped state, in other words - before > target > >> + * vm start > >> + * pending_compatible [output] : pending data which may be > migrated in any > >> + * phase > >> + * pending_postcopy_only [output] : pending data which must be > migrated in > >> + * postcopy phase or in stopped state, in other words - after > source > >> + * vm stop > >> + * Sum of pending_precopy_only, pending_compatible and > >> + * pending_postcopy_only is the whole amount of pending data. > > > > What's the significance of the provided threshold, are the pending > > bytes limited to threshold size? It makes me nervous to define a > > kernel API in terms of the internal API of a userspace program that can > > change. I wonder if it makes sense to define these in terms of the > > state of the devices, pending initial data, runtime data, post-offline > > data. > > > > Threshold is required, because that will tell size in bytes that user > space application buffer can accommodate. Driver can copy data less than > threshold, but copying data more than threshold doesn't make sense > because user space application won't be able to copy that extra data and > that data might get overwritten or lost. > > > >> + * > >> + * Flag VFIO_MIGRATION_GET_BUFFER: > >> + * On this flag, vendor driver should write data to migration > >> region and > >> + * return number of bytes written in the region. > >> + * bytes_written [output] : number of bytes written in > >> migration buffer by > >> + * vendor driver > > > > Does the data the user gets here depend on the device state set > > earlier? For example the user gets pending_precopy_only data while > > PRECOPY_ACTIVE is the device state and pending_postcopy_only data > > in STOPNCOPY_ACTIVE? The user should continue to call GET_BUFFER > > in a given state until the associated pending field reaches zero? > > Jumping between the region and ioctl is rather awkward. > > > > User gets pending_precopy_only data when device is in PRECOPY_ACTIVE > state, but each time when user calls GET_BUFFER, pending bytes might > change. > VFIO device's driver is producer of data and user/QEMU is consumer of > data. In pre-copy phase, when vCPUs are still running, driver will try > to accumulate as much data as possible in this phase, but vCPUs are > running and user of that device/application in guest is actively using > that device, then there are chances that during next iteration of > GET_BUFFER, driver might have more data. > Even in case of STOPNCOPY_ACTIVE state of device, driver can start > sending data in parts while a thread in vendor driver can still generate > data after device has halted, producer and consumer can run in parallel. > So User has to call GET_BUFFER until pending bytes are returned 0. > How to understand "the driver still generate data after device has halted"? Do interrupts still be generated after device halted? If so, it will lost interrupt information in pi_desc.pir . Thanks, -Gonglei