RE: [RFC PATCH v1 1/4] VFIO KABI for migration interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: kvm-owner@xxxxxxxxxxxxxxx [mailto:kvm-owner@xxxxxxxxxxxxxxx] On
> Behalf Of Kirti Wankhede
> Sent: Thursday, October 18, 2018 4:47 AM
> To: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Cc: cjia@xxxxxxxxxx; qemu-devel@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Yulei
> Zhang <yulei.zhang@xxxxxxxxx>; Dr. David Alan Gilbert
> <dgilbert@xxxxxxxxxx>; Juan Quintela <quintela@xxxxxxxxxx>; Wang, Zhi A
> <zhi.a.wang@xxxxxxxxx>
> Subject: Re: [RFC PATCH v1 1/4] VFIO KABI for migration interface
> 
> 
> On 10/17/2018 4:04 AM, Alex Williamson wrote:
> > On Tue, 16 Oct 2018 23:42:35 +0530
> > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> >
> >> - Added vfio_device_migration_info structure to use interact with vendor
> >>   driver.
> >> - Different flags are used to get or set migration related information
> >>   from/to vendor driver.
> >> Flag VFIO_MIGRATION_PROBE: To query if feature is supported
> >> Flag VFIO_MIGRATION_GET_REGION: To get migration region info
> >> Flag VFIO_MIGRATION_SET_STATE: To convey device state in vendor driver
> >> Flag VFIO_MIGRATION_GET_PENDING: To get pending bytes yet to be
> migrated
> >>   from vendor driver
> >> Flag VFIO_MIGRATION_GET_BUFFER: On this flag, vendor driver should
> write
> >>   data to migration region and return number of bytes written in the
> region
> >> Flag VFIO_MIGRATION_SET_BUFFER: In migration resume path, user space
> app
> >>   writes to migration region and communicates it to vendor driver with
> >>   this ioctl with this flag.
> >> Flag VFIO_MIGRATION_GET_DIRTY_PFNS: Get bitmap of dirty pages from
> vendor
> >>   driver from given start address
> >>
> >> - Added enum for possible device states.
> >>
> >> Signed-off-by: Kirti Wankhede <kwankhede@xxxxxxxxxx>
> >> Reviewed-by: Neo Jia <cjia@xxxxxxxxxx>
> >> ---
> >>  linux-headers/linux/vfio.h | 91
> ++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 91 insertions(+)
> >>
> >> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> >> index 3615a269d378..8e9045ed9aa8 100644
> >> --- a/linux-headers/linux/vfio.h
> >> +++ b/linux-headers/linux/vfio.h
> >> @@ -602,6 +602,97 @@ struct vfio_device_ioeventfd {
> >>
> >>  #define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE +
> 16)
> >>
> >> +/**
> >> + * VFIO_DEVICE_MIGRATION_INFO - _IOW(VFIO_TYPE, VFIO_BASE + 17,
> >> + *                              struct vfio_device_migration_info)
> >
> > This is quite a bit more than an "INFO" ioctl.
> >
> >> + * Flag VFIO_MIGRATION_PROBE:
> >> + *      To query if feature is supported
> >> + *
> >> + * Flag VFIO_MIGRATION_GET_REGION:
> >> + *      To get migration region info
> >> + *      region_index [output] : region index to be used for migration
> region
> >> + *      size [output]: size of migration region
> >
> > Of course the region migration region can describe itself as being used
> > for migration, so this is unnecessary.  The presence of that region
> > could also negate the need for a probe.
> >
> 
> Yes, that can be done.
> 
> 
> >> + *
> >> + * Flag VFIO_MIGRATION_SET_STATE:
> >> + *      To set device state in vendor driver
> >> + *      device_state [input] : User space app sends device state to
> vendor
> >> + *           driver on state change
> >
> > Valid states are the enum defined below, correct?
> >
> 
> Yes, that's correct.
> 
> > Does setting STOPNCOPY_ACTIVE stop any state change of the device or is
> > that expected to happen through other means?
> >
> 
> _PRECOPY_ACTIVE means vCPUs are still running, so VFIO device should
> still remain active.
> _STOPNCOPY_ACTIVE means vCPUs are not running and device should also be
> stopped and copy device's state.
> 
> > What are the allowable state transitions?
> >
> 
> Normal VM running case:
> _NONE -> _RUNNING
> 
> In case of live migration, at source:
> _RUNNING -> _SETUP -> _PRECOPY_ACTIVE -> _STOPNCOPY_ACTIVE ->
> _SAVE_COMPLETED
> 
> at destination:
> _NONE -> _SETUP -> _RESUME -> _RESUME_COMPLETE -> _RUNNING
> 
> In save VM case:
> _RUNNING -> _SETUP -> _STOPNCOPY_ACTIVE -> _SAVE_COMPLETED
> 
> In case of resuming VM from saved state:
> _NONE -> _SETUP -> _RESUME -> _RESUME_COMPLETE -> _RUNNING
> 
> _FAILED or _CANCELLED can happen in any state.
> 
> > How many bits in flags is a user allowed to set at once?
> >
> 
> One bit at a time. Probably, I should use enum for flags rather than bits.
> 
> >> + * Flag VFIO_MIGRATION_GET_PENDING:
> >> + *      To get pending bytes yet to be migrated from vendor driver
> >> + *      threshold_size [Input] : threshold of buffer in User space app.
> >> + *      pending_precopy_only [output] : pending data which must be
> migrated in
> >> + *          precopy phase or in stopped state, in other words - before
> target
> >> + *          vm start
> >> + *      pending_compatible [output] : pending data which may be
> migrated in any
> >> + *           phase
> >> + *      pending_postcopy_only [output] : pending data which must be
> migrated in
> >> + *           postcopy phase or in stopped state, in other words - after
> source
> >> + *           vm stop
> >> + *      Sum of pending_precopy_only, pending_compatible and
> >> + *      pending_postcopy_only is the whole amount of pending data.
> >
> > What's the significance of the provided threshold, are the pending
> > bytes limited to threshold size?  It makes me nervous to define a
> > kernel API in terms of the internal API of a userspace program that can
> > change.  I wonder if it makes sense to define these in terms of the
> > state of the devices, pending initial data, runtime data, post-offline
> > data.
> >
> 
> Threshold is required, because that will tell size in bytes that user
> space application buffer can accommodate. Driver can copy data less than
> threshold, but copying data more than threshold doesn't make sense
> because user space application won't be able to copy that extra data and
> that data might get overwritten or lost.
> 
> 
> >> + *
> >> + * Flag VFIO_MIGRATION_GET_BUFFER:
> >> + *      On this flag, vendor driver should write data to migration
> >> region and
> >> + *      return number of bytes written in the region.
> >> + *      bytes_written [output] : number of bytes written in
> >> migration buffer by
> >> + *          vendor driver
> >
> > Does the data the user gets here depend on the device state set
> > earlier?  For example the user gets pending_precopy_only data while
> > PRECOPY_ACTIVE is the device state and pending_postcopy_only data
> > in STOPNCOPY_ACTIVE?  The user should continue to call GET_BUFFER
> > in a given state until the associated pending field reaches zero?
> > Jumping between the region and ioctl is rather awkward.
> >
> 
> User gets pending_precopy_only data when device is in PRECOPY_ACTIVE
> state, but each time when user calls GET_BUFFER, pending bytes might
> change.
> VFIO device's driver is producer of data and user/QEMU is consumer of
> data. In pre-copy phase, when vCPUs are still running, driver will try
> to accumulate as much data as possible in this phase, but vCPUs are
> running and user of that device/application in guest is actively using
> that device, then there are chances that during next iteration of
> GET_BUFFER, driver might have more data.
> Even in case of STOPNCOPY_ACTIVE state of device, driver can start
> sending data in parts while a thread in vendor driver can still generate
> data after device has halted, producer and consumer can run in parallel.
> So User has to call GET_BUFFER until pending bytes are returned 0.
> 
How to understand "the driver still generate data after device has halted"?
Do interrupts still be generated after device halted? If so, it will lost interrupt
information in pi_desc.pir .

Thanks,
-Gonglei




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux