Re: [QEMU PATCH v2] kvmclock: advance clock by time window between vm_stop and pre_save

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Wed, 9 Nov 2016 17:32:50 -0200

On Tue, Nov 08, 2016 at 11:32:30AM -0200, Marcelo Tosatti wrote:
> On Tue, Nov 08, 2016 at 10:22:56AM +0000, Dr. David Alan Gilbert wrote:
> > * Marcelo Tosatti (mtosatti@xxxxxxxxxx) wrote:
> > > On Mon, Nov 07, 2016 at 08:03:50PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Marcelo Tosatti (mtosatti@xxxxxxxxxx) wrote:
> > > > > On Mon, Nov 07, 2016 at 03:46:11PM +0000, Dr. David Alan Gilbert wrote:
> > > > > > * Marcelo Tosatti (mtosatti@xxxxxxxxxx) wrote:
> > > > > > > This patch, relative to pre-copy migration codepath,
> > > > > > > measures the time between vm_stop() and pre_save(),
> > > > > > > which includes copying the remaining RAM to destination,
> > > > > > > and advances the clock by that amount.
> > > > > > > 
> > > > > > > In a VM with 5 seconds downtime, this reduces the guest
> > > > > > > clock difference on destination from 5s to 0.2s.
> > > > > > > 
> > > > > > > Tested with Linux and Windows 2012 R2 guests with -cpu XXX,+hv-time.
> > > > > > 
> > > > > > One thing that bothers me is that it's only this clock that's
> > > > > > getting corrected; doesn't it cause things to get upset when
> > > > > > one clock moves and the others dont?
> > > > > 
> > > > > If you are correlating the clocks, then yes.
> > > > > 
> > > > > Older Linux guests get upset (marking the TSC clocksource unstable
> > > > > because the watchdog checks TSC vs kvmclock), but there is a workaround for it 
> > > > > in newer guests
> > > > > (kvmclock interface to notify watchdog to not complain).
> > > > > 
> > > > > Note marking TSC clocksource unstable on older guests is harmless
> > > > > because kvmclock is the standard clocksource.
> > > > > 
> > > > > For Windows guests, i don't know that Windows correlates between different
> > > > > clocks.
> > > > > 
> > > > > That is, there is relative control as to which software reads kvmclock 
> > > > > or Windows TIMER MSR, so i don't see the need to advance every clock 
> > > > > exposed.
> > > > > 
> > > > > > Shouldn't the pause delay be recorded somewhere architecturally
> > > > > > independent and then be a thing that kvm-clock happens to use and
> > > > > > other clocks might as well?
> > > > > 
> > > > > In theory, yes. In practice, i don't see the need for this... 
> > > > 
> > > > It seems unlikely to me that x86 is the only one that will want
> > > > to do something similar.
> > > 
> > > Can't they copy what kvmclock is doing today? 
> > 
> > We shouldn't have copies of code all over should we?
> > 
> > Dave
> 
> Fine i'll add a notifier.
> 

KVM Linux guests all have to: 

Make CLOCK_MONOTONIC not count during vmpause
(because it mimicks behaviour under suspend/resume 
of baremetal). And because time overflows.

Assuming the HW clock counts while the VM is paused,
it means they have to hook into vmstate change
notifiers to:

        event           action
        vmstop          KVM_GET_CLOCK
        vmstart         KVM_SET_CLOCK (that earlier value)

For x86 we want to start counting the time there 
because while the VM is running, the host TSC 
is keeping track of time. 
So you measure the amount of time between:

        -> Guest VM clock stops ticking.
        -> clock_gettime(CLOCK_MONOTONIC, &pointA);
        ...
        -> presave: clock_gettime(CLOCK_MONOTONIC, &pointB);

I measured the additional time between presave and EOF: its about
5ms.

On destination, there is an additional 30ms between EOF receival 
and restoration of TSC.

Now, the clock difference is 130ms, and i am not sure where it 
comes from, trying to figure out. But the patch should give 35ms 
difference which is pretty good. Chasing that down...

So in summary: i don't see the point of making the code
"generic" without knowing what the other arches 
want to do.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html