Re: [PATCH 0/5] QEMU VFIO live migration

Alex Williamson <alex.williamson@xxxxxxxxxx> · Wed, 13 Mar 2019 13:14:54 -0600

On Tue, 12 Mar 2019 21:13:01 -0400
Zhao Yan <yan.y.zhao@xxxxxxxxx> wrote:

> hi Alex
> Any comments to the sequence below?
> 
> Actaully we have some concerns and suggestions to userspace-opaque migration
> data.
> 
> 1. if data is opaque to userspace, kernel interface must be tightly bound to
> migration. 
>    e.g. vendor driver has to know state (running + not logging) should not
>    return any data, and state (running + logging) should return whole
>    snapshot first and dirty later. it also has to know qemu migration will
>    not call GET_BUFFER in state (running + not logging), otherwise, it has
>    to adjust its behavior.

This all just sounds like defining the protocol we expect with the
interface.  For instance if we define a session as beginning when
logging is enabled and ending when the device is stopped and the
interface reports no more data is available, then we can state that any
partial accumulation of data is incomplete relative to migration.  If
userspace wants to initiate a new migration stream, they can simply
toggle logging.  How the vendor driver provides the data during the
session is not defined, but beginning the session with a snapshot
followed by repeated iterations of dirtied data is certainly a valid
approach.

> 2. vendor driver cannot ensure userspace get all the data it intends to
> save in pre-copy phase.
>   e.g. in stop-and-copy phase, vendor driver has to first check and send
>   data in previous phase.

First, I don't think the device has control of when QEMU switches from
pre-copy to stop-and-copy, the protocol needs to support that
transition at any point.  However, it seems a simply data available
counter provides an indication of when it might be optimal to make such
a transition.  If a vendor driver follows a scheme as above, the
available data counter would indicate a large value, the entire initial
snapshot of the device.  As the migration continues and pages are
dirtied, the device would reach a steady state amount of data
available, depending on the guest activity.  This could indicate to the
user to stop the device.  The migration stream would not be considered
completed until the available data counter reaches zero while the
device is in the stopped|logging state.

> 3. if all the sequence is tightly bound to live migration, can we remove the
> logging state? what about adding two states migrate-in and migrate-out?
> so there are four states: running, stopped, migrate-in, migrate-out.
>    migrate-out is for source side when migration starts. together with
>    state running and stopped, it can substitute state logging.
>    migrate-in is for target side.

In fact, Kirti's implementation specifies a data direction, but I think
we still need logging to indicate sessions.  I'd also assume that
logging implies some overhead for the vendor driver.

> On Tue, Mar 12, 2019 at 10:57:47AM +0800, Zhao Yan wrote:
> > hi Alex
> > thanks for your reply.
> > 
> > So, if we choose migration data to be userspace opaque, do you think below
> > sequence is the right behavior for vendor driver to follow:
> > 
> > 1. initially LOGGING state is not set. If userspace calls GET_BUFFER to
> > vendor driver,  vendor driver should reject and return 0.

What would this state mean otherwise?  If we're not logging then it
should not be expected that we can construct dirtied data from a
previous read of the state before logging was enabled (it would be
outside of the "session").  So at best this is an incomplete segment of
the initial snapshot of the device, but that presumes how the vendor
driver constructs the data.  I wouldn't necessarily mandate the vendor
driver reject it, but I think we should consider it undefined and
vendor specific relative to the migration interface.

> > 2. then LOGGING state is set, if userspace calls GET_BUFFER to vendor
> > driver,
> >    a. vendor driver shoud first query a whole snapshot of device memory
> >    (let's use this term to represent device's standalone memory for now),
> >    b. vendor driver returns a chunk of data just queried to userspace,
> >    while recording current pos in data.
> >    c. vendor driver finds all data just queried is finished transmitting to
> >    userspace, and queries only dirty data in device memory now.
> >    d. vendor driver returns a chunk of data just quered (this time is dirty
> >    data )to userspace while recording current pos in data
> >    e. if all data is transmited to usespace and still GET_BUFFERs come from
> >    userspace, vendor driver starts another round of dirty data query.

This is a valid vendor driver approach, but it's outside the scope of
the interface definition.  A vendor driver could also decide to not
provide any data until both stopped and logging are set and then
provide a fixed, final snapshot.  The interface supports either
approach by defining the protocol to interact with it.

> > 3. if LOGGING state is unset then, and userpace calls GET_BUFFER to vendor
> > driver,
> >    a. if vendor driver finds there's previously untransmitted data, returns
> >    them until all transmitted.
> >    b. vendor driver then queries dirty data again and transmits them.
> >    c. at last, vendor driver queris device config data (which has to be
> >    queried at last and sent once) and transmits them.

This seems broken, the vendor driver is presuming the user intentions.
If logging is unset, we return to bullet 1, reading data is undefined
and vendor specific.  It's outside of the session.

> > for the 1 bullet, if LOGGING state is firstly set and migration aborts
> > then,  vendor driver has to be able to detect that condition. so seemingly,
> > vendor driver has to know more qemu's migration state, like migration
> > called and failed. Do you think that's acceptable?

If migration aborts, logging is cleared and the device continues
operation.  If a new migration is started, the session is initiated by
enabling logging.  Sound reasonable?  Thanks,

Alex