Re: [QEMU PATCH] kvmclock: advance clock by time window between vm_stop and pre_save

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Fri, 4 Nov 2016 18:34:20 +0100

2016-11-04 14:24-0200, Marcelo Tosatti:
> On Fri, Nov 04, 2016 at 04:48:28PM +0100, Radim Krčmář wrote:
>> 2016-11-04 16:33+0100, Paolo Bonzini:
>> > On 04/11/2016 16:25, Radim Krčmář wrote:
>> >>> >  
>> >>> > +        if (s->advance_clock && s->clock + s->advance_clock > s->clock) {
>> >>> > +            s->clock += s->advance_clock;
>> >>> > +            s->advance_clock = 0;
>> >>> > +        }
>> >> Can't the advance_clock added to the migrated KVMClockState instead of
>> >> passing it as another parameter?
>> >> 
>> >> (It is sad that we can't just query KVMClockState in kvmclock_pre_save
>> >>  because of the Linux bug.)
>> > 
>> > What Linux bug?  The one that makes us use kvmclock_current_nsec?
>> 
>> No, the one that forced Marcelo to add the 10 minute limit to the
>> advance_clock. 
> 
> Overflow when reading clocksource, in the timer interrupt.

Is it still there?  I wonder why 10 minutes is the limit. :)

>>  We wouldn't need this advance_clock hack if we could
>> just call KVM_GET_CLOCK like we did before 00f4d64ee76e ("kvmclock:
>> clock should count only if vm is running").
> 
> People have requested that CLOCK_MONOTONIC stops when 
> vm suspends.

Ugh, we could have done that on the OS level, but stopping kvmclock
means that we sabotaged CLOCK_BOOTTIME as it doesn't return
CLOCK_MONOTONIC + time spent in suspend anymore.

And kvmclock is defined to count nanosecond (slightly different in
practice) since some point in time, so pausing it is not really valid.

>> > It should work with 4.9-rc (well, once Linus applies my pull request).
>> > 4.9-rc will not return ktime_get_ns for KVM_GET_CLOCK; it will return
>> > the raw value from the kernel timekeeper.
>> > 
>> > I'm thinking that we should add a KVM capability for this, and skip
>> > kvmclock_current_nsec if the capability is present.  The first part is
>> > trivial, so we can do it even during Linux rc period.
>> 
>> I agree, that sounds like a nice improvement.
> 
> I fail to see how this (KVM_GET_CLOCK using raw value (vs NTP corrected)
> relates to clocksource overflow and CLOCK_MONOTONIC stop requests.

If the guest uses master clock, then kvmclock drifts away from BOOTTIME,
so returning the value of kvmclock using kernel_ns() would introduce a
drift when setting the clock -- we must read the same way that the guest
does.

> -----
> 
> Todo list (collective???):
> 
> 1) Implement suggestion to Roman: 
> "Subject: Re: [PATCH] x86/kvm: fix condition to update kvm master
> clocks" to handle the time backwards when updating masterclock 
> from kernel clock.
> 
> 2) Review TSC scaling support (make sure scaled TSC 
> is propagated properly everywhere).
> 
> 3) Enable migration with invariant TSC.
> 
> 4) Fullvirt fix for local APIC deadline timer bug

The list looks good.

I'll resume work on (4) now that rework of APIC timers is merged.
It's going to be ugly on the guest side, because linux could be using it
even when not using kvmclock for sched clock ...

> (BTW Paolo Windows is also vulnerable right).

Ugh, we can't do much more than disabling TSC deadline there until QEMU
can migrate it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html