Re: [Qemu-devel] [RFC PATCH v1 1/4] VFIO KABI for migration interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/17/2018 3:39 PM, Dr. David Alan Gilbert wrote:
> * Kirti Wankhede (kwankhede@xxxxxxxxxx) wrote:
>> - Added vfio_device_migration_info structure to use interact with vendor
>>   driver.
>> - Different flags are used to get or set migration related information
>>   from/to vendor driver.
>> Flag VFIO_MIGRATION_PROBE: To query if feature is supported
>> Flag VFIO_MIGRATION_GET_REGION: To get migration region info
>> Flag VFIO_MIGRATION_SET_STATE: To convey device state in vendor driver
>> Flag VFIO_MIGRATION_GET_PENDING: To get pending bytes yet to be migrated
>>   from vendor driver
>> Flag VFIO_MIGRATION_GET_BUFFER: On this flag, vendor driver should write
>>   data to migration region and return number of bytes written in the region
>> Flag VFIO_MIGRATION_SET_BUFFER: In migration resume path, user space app
>>   writes to migration region and communicates it to vendor driver with
>>   this ioctl with this flag.
>> Flag VFIO_MIGRATION_GET_DIRTY_PFNS: Get bitmap of dirty pages from vendor
>>   driver from given start address
>>
>> - Added enum for possible device states.
>>
>> Signed-off-by: Kirti Wankhede <kwankhede@xxxxxxxxxx>
>> Reviewed-by: Neo Jia <cjia@xxxxxxxxxx>
>> ---
>>  linux-headers/linux/vfio.h | 91 ++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 91 insertions(+)
>>
>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>> index 3615a269d378..8e9045ed9aa8 100644
>> --- a/linux-headers/linux/vfio.h
>> +++ b/linux-headers/linux/vfio.h
>> @@ -602,6 +602,97 @@ struct vfio_device_ioeventfd {
>>  
>>  #define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
>>  
>> +/**
>> + * VFIO_DEVICE_MIGRATION_INFO - _IOW(VFIO_TYPE, VFIO_BASE + 17,
>> + *                              struct vfio_device_migration_info)
> 
> This is a little odd; what you really have here is 7 different
> operations that can be performed; they're independent operations all
> relating to part of migration; so instead of a 'flag' it's more of an
> operation type, and there's no reason to make them individual bit flags
> - for edxample there's no reason you'd want to OR together
> MIGRATION_GET_REGION and MIGRATION_GET_PENDING - they're just
> independent.
> (Similarly you could just make them 7 ioctls)
> 

Right, all flags are independent commands. But I tried to use one ioctl
number.

> I guess what you're trying to do here is a direct 1-1 mapping of qemu's
> struct SaveVMHandlers interface to devices.
> That's not necessarily a bad idea, but remember it's not a stable
> interface, so QEMU will change it over time and you'll have to then
> figure out how to shim these interfaces to it, so it's worth keeping
> that in mind, just in case you can make these interfaces more general.
> 

Alex suggested using sparse region instead of ioctl, I'm making note of
your point to define structures when implementing this interface using
sparse mmap region.


>> + * Flag VFIO_MIGRATION_PROBE:
>> + *      To query if feature is supported
>> + *
>> + * Flag VFIO_MIGRATION_GET_REGION:
>> + *      To get migration region info
>> + *      region_index [output] : region index to be used for migration region
>> + *      size [output]: size of migration region
>> + *
>> + * Flag VFIO_MIGRATION_SET_STATE:
>> + *      To set device state in vendor driver
>> + *      device_state [input] : User space app sends device state to vendor
>> + *           driver on state change
>> + *
>> + * Flag VFIO_MIGRATION_GET_PENDING:
>> + *      To get pending bytes yet to be migrated from vendor driver
>> + *      threshold_size [Input] : threshold of buffer in User space app.
>> + *      pending_precopy_only [output] : pending data which must be migrated in
>> + *          precopy phase or in stopped state, in other words - before target
>> + *          vm start
>> + *      pending_compatible [output] : pending data which may be migrated in any
>> + *           phase
>> + *      pending_postcopy_only [output] : pending data which must be migrated in
>> + *           postcopy phase or in stopped state, in other words - after source
>> + *           vm stop
>> + *      Sum of pending_precopy_only, pending_compatible and
>> + *      pending_postcopy_only is the whole amount of pending data.
>> + *
>> + * Flag VFIO_MIGRATION_GET_BUFFER:
>> + *      On this flag, vendor driver should write data to migration region and
>> + *      return number of bytes written in the region.
>> + *      bytes_written [output] : number of bytes written in migration buffer by
>> + *          vendor driver
>> + *
>> + * Flag VFIO_MIGRATION_SET_BUFFER
>> + *      In migration resume path, user space app writes to migration region and
>> + *      communicates it to vendor driver with this ioctl with this flag.
>> + *      bytes_written [Input] : number of bytes written in migration buffer by
>> + *          user space app.
> 
> I guess we call getbuffer/setbuffer multiple times.  Is there anything
> that needs to be passed to get the data to line up (e.g. offset into the
> data)
> 

I think, vendor driver can keep track of offset as it knows what data
copied and what is remaining.


>> + * Flag VFIO_MIGRATION_GET_DIRTY_PFNS
>> + *      Get bitmap of dirty pages from vendor driver from given start address.
>> + *      start_addr [Input] : start address
>> + *      pfn_count [Input] : Total pfn count from start_addr for which dirty
>> + *          bitmap is requested
>> + *      dirty_bitmap [Output] : bitmap memory allocated by user space
>> + *           application, vendor driver should return the bitmap with bits set
>> + *           only for pages to be marked dirty.
>> + * Return: 0 on success, -errno on failure.
>> + */
>> +
>> +struct vfio_device_migration_info {
>> +	__u32 argsz;
>> +	__u32 flags;
>> +#define VFIO_MIGRATION_PROBE            (1 << 0)
>> +#define VFIO_MIGRATION_GET_REGION       (1 << 1)
>> +#define VFIO_MIGRATION_SET_STATE        (1 << 2)
>> +#define VFIO_MIGRATION_GET_PENDING      (1 << 3)
>> +#define VFIO_MIGRATION_GET_BUFFER       (1 << 4)
>> +#define VFIO_MIGRATION_SET_BUFFER       (1 << 5)
>> +#define VFIO_MIGRATION_GET_DIRTY_PFNS   (1 << 6)
>> +        __u32 region_index;	            /* region index */
>> +        __u64 size;	                    /* size */
>> +        __u32 device_state;                 /* VFIO device state */
>> +        __u64 pending_precopy_only;
>> +        __u64 pending_compatible;
>> +        __u64 pending_postcopy_only;
>> +        __u64 threshold_size;
>> +        __u64 bytes_written;
>> +        __u64 start_addr;
>> +        __u64 pfn_count;
>> +	__u8  dirty_bitmap[];
>> +};
> 
> This feels like it should be separate types for the different calls.
> 
> Also, is there no state to tell the userspace what version of migration
> data we have?  What happens if I try and load a migration state
> from an older driver into a newer one, or the other way around,
> or into a destination card that's not the same as the source.
> 

As I mentioned in other reply, this RFC is mainly focused on core logic
of live migration which include implementation for pre-copy, stop-n-copy
and resume phases, defining device states.

>> +enum {
>> +    VFIO_DEVICE_STATE_NONE,
>> +    VFIO_DEVICE_STATE_RUNNING,
>> +    VFIO_DEVICE_STATE_MIGRATION_SETUP,
>> +    VFIO_DEVICE_STATE_MIGRATION_PRECOPY_ACTIVE,
>> +    VFIO_DEVICE_STATE_MIGRATION_STOPNCOPY_ACTIVE,
>> +    VFIO_DEVICE_STATE_MIGRATION_SAVE_COMPLETED,
>> +    VFIO_DEVICE_STATE_MIGRATION_RESUME,
>> +    VFIO_DEVICE_STATE_MIGRATION_RESUME_COMPLETED,
>> +    VFIO_DEVICE_STATE_MIGRATION_FAILED,
>> +    VFIO_DEVICE_STATE_MIGRATION_CANCELLED,
>> +};
> 
> That's an interesting merge of QEMU's run-state and it's migration
> state.
> 
> You probably need to define which transitions you take as legal.
>

In reply to Alex, I had listed down allowable state transitions.

Thanks,
Kirti

> Dave
> 
>> +
>> +#define VFIO_DEVICE_MIGRATION_INFO _IO(VFIO_TYPE, VFIO_BASE + 17)
>> +
>>  /* -------- API for Type1 VFIO IOMMU -------- */
>>  
>>  /**
>> -- 
>> 2.7.0
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
> 



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux