Re: [PATCH 0/5] QEMU VFIO live migration

Zhao Yan <yan.y.zhao@xxxxxxxxx> · Wed, 13 Mar 2019 21:12:22 -0400

On Thu, Mar 14, 2019 at 03:14:54AM +0800, Alex Williamson wrote:
> On Tue, 12 Mar 2019 21:13:01 -0400
> Zhao Yan <yan.y.zhao@xxxxxxxxx> wrote:
> 
> > hi Alex
> > Any comments to the sequence below?
> > 
> > Actaully we have some concerns and suggestions to userspace-opaque migration
> > data.
> > 
> > 1. if data is opaque to userspace, kernel interface must be tightly bound to
> > migration. 
> >    e.g. vendor driver has to know state (running + not logging) should not
> >    return any data, and state (running + logging) should return whole
> >    snapshot first and dirty later. it also has to know qemu migration will
> >    not call GET_BUFFER in state (running + not logging), otherwise, it has
> >    to adjust its behavior.
> 
> This all just sounds like defining the protocol we expect with the
> interface.  For instance if we define a session as beginning when
> logging is enabled and ending when the device is stopped and the
> interface reports no more data is available, then we can state that any
> partial accumulation of data is incomplete relative to migration.  If
> userspace wants to initiate a new migration stream, they can simply
> toggle logging.  How the vendor driver provides the data during the
> session is not defined, but beginning the session with a snapshot
> followed by repeated iterations of dirtied data is certainly a valid
> approach.
> 
> > 2. vendor driver cannot ensure userspace get all the data it intends to
> > save in pre-copy phase.
> >   e.g. in stop-and-copy phase, vendor driver has to first check and send
> >   data in previous phase.
> 
> First, I don't think the device has control of when QEMU switches from
> pre-copy to stop-and-copy, the protocol needs to support that
> transition at any point.  However, it seems a simply data available
> counter provides an indication of when it might be optimal to make such
> a transition.  If a vendor driver follows a scheme as above, the
> available data counter would indicate a large value, the entire initial
> snapshot of the device.  As the migration continues and pages are
> dirtied, the device would reach a steady state amount of data
> available, depending on the guest activity.  This could indicate to the
> user to stop the device.  The migration stream would not be considered
> completed until the available data counter reaches zero while the
> device is in the stopped|logging state.
> 
> > 3. if all the sequence is tightly bound to live migration, can we remove the
> > logging state? what about adding two states migrate-in and migrate-out?
> > so there are four states: running, stopped, migrate-in, migrate-out.
> >    migrate-out is for source side when migration starts. together with
> >    state running and stopped, it can substitute state logging.
> >    migrate-in is for target side.
> 
> In fact, Kirti's implementation specifies a data direction, but I think
> we still need logging to indicate sessions.  I'd also assume that
> logging implies some overhead for the vendor driver.
>
ok. If you prefer logging, I'm ok with it. just found migrate-in and
migrate-out are more universal againt hardware requirement changes.

> > On Tue, Mar 12, 2019 at 10:57:47AM +0800, Zhao Yan wrote:
> > > hi Alex
> > > thanks for your reply.
> > > 
> > > So, if we choose migration data to be userspace opaque, do you think below
> > > sequence is the right behavior for vendor driver to follow:
> > > 
> > > 1. initially LOGGING state is not set. If userspace calls GET_BUFFER to
> > > vendor driver,  vendor driver should reject and return 0.
> 
> What would this state mean otherwise?  If we're not logging then it
> should not be expected that we can construct dirtied data from a
> previous read of the state before logging was enabled (it would be
> outside of the "session").  So at best this is an incomplete segment of
> the initial snapshot of the device, but that presumes how the vendor
> driver constructs the data.  I wouldn't necessarily mandate the vendor
> driver reject it, but I think we should consider it undefined and
> vendor specific relative to the migration interface.
> 
> > > 2. then LOGGING state is set, if userspace calls GET_BUFFER to vendor
> > > driver,
> > >    a. vendor driver shoud first query a whole snapshot of device memory
> > >    (let's use this term to represent device's standalone memory for now),
> > >    b. vendor driver returns a chunk of data just queried to userspace,
> > >    while recording current pos in data.
> > >    c. vendor driver finds all data just queried is finished transmitting to
> > >    userspace, and queries only dirty data in device memory now.
> > >    d. vendor driver returns a chunk of data just quered (this time is dirty
> > >    data )to userspace while recording current pos in data
> > >    e. if all data is transmited to usespace and still GET_BUFFERs come from
> > >    userspace, vendor driver starts another round of dirty data query.
> 
> This is a valid vendor driver approach, but it's outside the scope of
> the interface definition.  A vendor driver could also decide to not
> provide any data until both stopped and logging are set and then
> provide a fixed, final snapshot.  The interface supports either
> approach by defining the protocol to interact with it.
> 
> > > 3. if LOGGING state is unset then, and userpace calls GET_BUFFER to vendor
> > > driver,
> > >    a. if vendor driver finds there's previously untransmitted data, returns
> > >    them until all transmitted.
> > >    b. vendor driver then queries dirty data again and transmits them.
> > >    c. at last, vendor driver queris device config data (which has to be
> > >    queried at last and sent once) and transmits them.
> 
> This seems broken, the vendor driver is presuming the user intentions.
> If logging is unset, we return to bullet 1, reading data is undefined
> and vendor specific.  It's outside of the session.
> 
> > > for the 1 bullet, if LOGGING state is firstly set and migration aborts
> > > then,  vendor driver has to be able to detect that condition. so seemingly,
> > > vendor driver has to know more qemu's migration state, like migration
> > > called and failed. Do you think that's acceptable?
> 
> If migration aborts, logging is cleared and the device continues
> operation.  If a new migration is started, the session is initiated by
> enabling logging.  Sound reasonable?  Thanks,
>

For the flow, I still have a question.
There are 2 approaches below, which one do you prefer?

Approach A, in precopy stage, the sequence is

(1)
.save_live_pending --> return whole snapshot size
.save_live_iterate --> save whole snapshot

(2)
.save_live_pending --> get dirty data, return dirty data size
.save_live_iterate --> save all dirty data

(3)
.save_live_pending --> get dirty data again, return dirty data size
.save_live_iterate --> save all dirty data

Approach B, in precopy stage, the sequence is
(1)
.save_live_pending --> return whole snapshot size
.save_live_iterate --> save part of snapshot

(2)
.save_live_pending --> return rest part of whole snapshot size +
                              current dirty data size
.save_live_iterate --> save part of snapshot 

(3) repeat (2) until whole snapshot saved.

(4) 
.save_live_pending --> get diryt data and return current dirty data size
.save_live_iterate --> save part of dirty data

(5)
.save_live_pending --> return reset part of dirty data size +
			     delta size of dirty data
.save_live_iterate --> save part of dirty data

(6)
repeat (5) until precopy stops

> Alex
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev