Re: [QEMU PATCH] kvmclock: advance clock by time window between vm_stop and pre_save

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Fri, 4 Nov 2016 12:00:38 -0200

On Fri, Nov 04, 2016 at 10:35:39AM -0200, Marcelo Tosatti wrote:
> On Fri, Nov 04, 2016 at 01:28:48PM +0100, Juan Quintela wrote:
> > Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> > > This patch, relative to pre-copy migration codepath,
> > > measures the time between vm_stop() and pre_save(), 
> > > which includes copying the remaining RAM to destination,
> > > and advances the clock by that amount.
> > >
> > > In a VM with 5 seconds downtime, this reduces the guest 
> > > clock difference on destination from 5s to 0.2s.
> > >
> > > Please do not apply this yet as some codepaths still need
> > > checking, submitting early for comments.
> > >
> > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> > 
> > You can use an optional section, and then you don't need to increase the
> > version number.
> 
> Optional section is more appropriate, thanks.
> 
> > I believe you that the clock manipulation is right, only talking about
> > the migration bits.
> > 
> > > +static uint64_t clock_delta(struct timespec *before, struct timespec *after)
> > > +{
> > > +    if (before->tv_sec > after->tv_sec ||
> > > +        (before->tv_sec == after->tv_sec &&
> > > +         before->tv_nsec > after->tv_nsec)) {
> > > +        fprintf(stderr, "clock_delta failed: before=(%ld sec, %ld nsec),"
> > > +                        "after=(%ld sec, %ld nsec)\n", before->tv_sec,
> > > +                        before->tv_nsec, after->tv_sec, after->tv_nsec);
> > > +        abort();
> > > +    }
> > > +
> > > +    return (after->tv_sec - before->tv_sec) * 1000000000ULL +
> > > +            after->tv_nsec - before->tv_nsec;
> > > +}
> > 
> > I can't believe that we don't have a helper function already to
> > calculate this....
> 
> Couldnt find any...
> 
> > > +
> > > +static void kvmclock_pre_save(void *opaque)
> > > +{
> > > +    KVMClockState *s = opaque;
> > > +    struct timespec now;
> > > +    uint64_t ns;
> > > +
> > > +    if (s->t_aftervmstop.tv_sec == 0) {
> > > +        return;
> > > +    }
> > 
> > You have your test here.
> > 
> > > +
> > > +    clock_gettime(CLOCK_MONOTONIC, &now);
> > > +
> > > +    ns = clock_delta(&s->t_aftervmstop, &now);
> > > +
> > > +    /*
> > > +     * Linux guests can overflow if time jumps
> > > +     * forward in large increments.
> > > +     * Cap maximum adjustment to 10 minutes.
> > > +     */
> > > +    ns = MIN(ns, 600000000000ULL);
> > > +
> > > +    if (s->clock + ns > s->clock) {
> > > +        s->ns = ns;
> > 
> > Would it be a good idea to print an error message here?  If it has been more
> > than 10mins since we did the vmstop, something got wrong here.
> 
> Not sure... is it not possible for the user to stop migration in some 
> way? 
> 
> What if network is very slow and maxdowntime very high?
> 
> > > +    }
> > > +}
> > > +
> > > +static int kvmclock_post_load(void *opaque, int version_id)
> > > +{
> > > +    KVMClockState *s = opaque;
> > > +
> > > +    /* save the value from incoming migration */
> > > +    s->advance_clock = s->ns;
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >  static const VMStateDescription kvmclock_vmsd = {
> > >      .name = "kvmclock",
> > > -    .version_id = 1,
> > > +    .version_id = 2,
> > >      .minimum_version_id = 1,
> > > +    .pre_save = kvmclock_pre_save,
> > > +    .post_load = kvmclock_post_load,
> > >      .fields = (VMStateField[]) {
> > >          VMSTATE_UINT64(clock, KVMClockState),
> > > +        VMSTATE_UINT64_V(ns, KVMClockState, 2),
> > >          VMSTATE_END_OF_LIST()
> > >      }
> > >  };
> > 
> > 
> > If you need help with the subsection stuff, just ask.
> > 
> > Later, Juan.
> 
> Ok, i'll try to cook up an optional section and lets see what happens.
> 
> Thanks Juan.

Ok so by "optional section" i meant a section that when sent 
to destination, could be ignored and migration would succeed. 

The alternative (what this patch has now) is to increase migration
version so that:

    1. older machine types remain compatible. 
    2. newer machine types fail to migrate.

Because the data being sent, ns, is not really optional: if kvmclock or
hyper-v time is enabled (which should be 100% of the cases) we always
want to send that data.

That is, there is no difference between:

* Writing a subsection with needed=1 always (except when 
using an older machine types).
* Using old/new machine types with particular versions.

I think i missed the patch to switch current machine
types to kvmclock v1, BTW.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html