On 10/17/2018 3:39 PM, Dr. David Alan Gilbert wrote: > * Kirti Wankhede (kwankhede@xxxxxxxxxx) wrote: >> - Added vfio_device_migration_info structure to use interact with vendor >> driver. >> - Different flags are used to get or set migration related information >> from/to vendor driver. >> Flag VFIO_MIGRATION_PROBE: To query if feature is supported >> Flag VFIO_MIGRATION_GET_REGION: To get migration region info >> Flag VFIO_MIGRATION_SET_STATE: To convey device state in vendor driver >> Flag VFIO_MIGRATION_GET_PENDING: To get pending bytes yet to be migrated >> from vendor driver >> Flag VFIO_MIGRATION_GET_BUFFER: On this flag, vendor driver should write >> data to migration region and return number of bytes written in the region >> Flag VFIO_MIGRATION_SET_BUFFER: In migration resume path, user space app >> writes to migration region and communicates it to vendor driver with >> this ioctl with this flag. >> Flag VFIO_MIGRATION_GET_DIRTY_PFNS: Get bitmap of dirty pages from vendor >> driver from given start address >> >> - Added enum for possible device states. >> >> Signed-off-by: Kirti Wankhede <kwankhede@xxxxxxxxxx> >> Reviewed-by: Neo Jia <cjia@xxxxxxxxxx> >> --- >> linux-headers/linux/vfio.h | 91 ++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 91 insertions(+) >> >> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h >> index 3615a269d378..8e9045ed9aa8 100644 >> --- a/linux-headers/linux/vfio.h >> +++ b/linux-headers/linux/vfio.h >> @@ -602,6 +602,97 @@ struct vfio_device_ioeventfd { >> >> #define VFIO_DEVICE_IOEVENTFD _IO(VFIO_TYPE, VFIO_BASE + 16) >> >> +/** >> + * VFIO_DEVICE_MIGRATION_INFO - _IOW(VFIO_TYPE, VFIO_BASE + 17, >> + * struct vfio_device_migration_info) > > This is a little odd; what you really have here is 7 different > operations that can be performed; they're independent operations all > relating to part of migration; so instead of a 'flag' it's more of an > operation type, and there's no reason to make them individual bit flags > - for edxample there's no reason you'd want to OR together > MIGRATION_GET_REGION and MIGRATION_GET_PENDING - they're just > independent. > (Similarly you could just make them 7 ioctls) > Right, all flags are independent commands. But I tried to use one ioctl number. > I guess what you're trying to do here is a direct 1-1 mapping of qemu's > struct SaveVMHandlers interface to devices. > That's not necessarily a bad idea, but remember it's not a stable > interface, so QEMU will change it over time and you'll have to then > figure out how to shim these interfaces to it, so it's worth keeping > that in mind, just in case you can make these interfaces more general. > Alex suggested using sparse region instead of ioctl, I'm making note of your point to define structures when implementing this interface using sparse mmap region. >> + * Flag VFIO_MIGRATION_PROBE: >> + * To query if feature is supported >> + * >> + * Flag VFIO_MIGRATION_GET_REGION: >> + * To get migration region info >> + * region_index [output] : region index to be used for migration region >> + * size [output]: size of migration region >> + * >> + * Flag VFIO_MIGRATION_SET_STATE: >> + * To set device state in vendor driver >> + * device_state [input] : User space app sends device state to vendor >> + * driver on state change >> + * >> + * Flag VFIO_MIGRATION_GET_PENDING: >> + * To get pending bytes yet to be migrated from vendor driver >> + * threshold_size [Input] : threshold of buffer in User space app. >> + * pending_precopy_only [output] : pending data which must be migrated in >> + * precopy phase or in stopped state, in other words - before target >> + * vm start >> + * pending_compatible [output] : pending data which may be migrated in any >> + * phase >> + * pending_postcopy_only [output] : pending data which must be migrated in >> + * postcopy phase or in stopped state, in other words - after source >> + * vm stop >> + * Sum of pending_precopy_only, pending_compatible and >> + * pending_postcopy_only is the whole amount of pending data. >> + * >> + * Flag VFIO_MIGRATION_GET_BUFFER: >> + * On this flag, vendor driver should write data to migration region and >> + * return number of bytes written in the region. >> + * bytes_written [output] : number of bytes written in migration buffer by >> + * vendor driver >> + * >> + * Flag VFIO_MIGRATION_SET_BUFFER >> + * In migration resume path, user space app writes to migration region and >> + * communicates it to vendor driver with this ioctl with this flag. >> + * bytes_written [Input] : number of bytes written in migration buffer by >> + * user space app. > > I guess we call getbuffer/setbuffer multiple times. Is there anything > that needs to be passed to get the data to line up (e.g. offset into the > data) > I think, vendor driver can keep track of offset as it knows what data copied and what is remaining. >> + * Flag VFIO_MIGRATION_GET_DIRTY_PFNS >> + * Get bitmap of dirty pages from vendor driver from given start address. >> + * start_addr [Input] : start address >> + * pfn_count [Input] : Total pfn count from start_addr for which dirty >> + * bitmap is requested >> + * dirty_bitmap [Output] : bitmap memory allocated by user space >> + * application, vendor driver should return the bitmap with bits set >> + * only for pages to be marked dirty. >> + * Return: 0 on success, -errno on failure. >> + */ >> + >> +struct vfio_device_migration_info { >> + __u32 argsz; >> + __u32 flags; >> +#define VFIO_MIGRATION_PROBE (1 << 0) >> +#define VFIO_MIGRATION_GET_REGION (1 << 1) >> +#define VFIO_MIGRATION_SET_STATE (1 << 2) >> +#define VFIO_MIGRATION_GET_PENDING (1 << 3) >> +#define VFIO_MIGRATION_GET_BUFFER (1 << 4) >> +#define VFIO_MIGRATION_SET_BUFFER (1 << 5) >> +#define VFIO_MIGRATION_GET_DIRTY_PFNS (1 << 6) >> + __u32 region_index; /* region index */ >> + __u64 size; /* size */ >> + __u32 device_state; /* VFIO device state */ >> + __u64 pending_precopy_only; >> + __u64 pending_compatible; >> + __u64 pending_postcopy_only; >> + __u64 threshold_size; >> + __u64 bytes_written; >> + __u64 start_addr; >> + __u64 pfn_count; >> + __u8 dirty_bitmap[]; >> +}; > > This feels like it should be separate types for the different calls. > > Also, is there no state to tell the userspace what version of migration > data we have? What happens if I try and load a migration state > from an older driver into a newer one, or the other way around, > or into a destination card that's not the same as the source. > As I mentioned in other reply, this RFC is mainly focused on core logic of live migration which include implementation for pre-copy, stop-n-copy and resume phases, defining device states. >> +enum { >> + VFIO_DEVICE_STATE_NONE, >> + VFIO_DEVICE_STATE_RUNNING, >> + VFIO_DEVICE_STATE_MIGRATION_SETUP, >> + VFIO_DEVICE_STATE_MIGRATION_PRECOPY_ACTIVE, >> + VFIO_DEVICE_STATE_MIGRATION_STOPNCOPY_ACTIVE, >> + VFIO_DEVICE_STATE_MIGRATION_SAVE_COMPLETED, >> + VFIO_DEVICE_STATE_MIGRATION_RESUME, >> + VFIO_DEVICE_STATE_MIGRATION_RESUME_COMPLETED, >> + VFIO_DEVICE_STATE_MIGRATION_FAILED, >> + VFIO_DEVICE_STATE_MIGRATION_CANCELLED, >> +}; > > That's an interesting merge of QEMU's run-state and it's migration > state. > > You probably need to define which transitions you take as legal. > In reply to Alex, I had listed down allowable state transitions. Thanks, Kirti > Dave > >> + >> +#define VFIO_DEVICE_MIGRATION_INFO _IO(VFIO_TYPE, VFIO_BASE + 17) >> + >> /* -------- API for Type1 VFIO IOMMU -------- */ >> >> /** >> -- >> 2.7.0 >> >> > -- > Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK >