Re: [QEMU PATCH v2] kvmclock: advance clock by time window between vm_stop and pre_save

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Wed, 9 Nov 2016 17:33:09 +0100

On 09/11/2016 17:28, Dr. David Alan Gilbert wrote:
> * Paolo Bonzini (pbonzini@xxxxxxxxxx) wrote:
>>
>>
>> On 08/11/2016 11:22, Dr. David Alan Gilbert wrote:
>>> * Marcelo Tosatti (mtosatti@xxxxxxxxxx) wrote:
>>>> On Mon, Nov 07, 2016 at 08:03:50PM +0000, Dr. David Alan Gilbert wrote:
>>>>> * Marcelo Tosatti (mtosatti@xxxxxxxxxx) wrote:
>>>>>> On Mon, Nov 07, 2016 at 03:46:11PM +0000, Dr. David Alan Gilbert wrote:
>>>>>>> * Marcelo Tosatti (mtosatti@xxxxxxxxxx) wrote:
>>>>>>>> This patch, relative to pre-copy migration codepath,
>>>>>>>> measures the time between vm_stop() and pre_save(),
>>>>>>>> which includes copying the remaining RAM to destination,
>>>>>>>> and advances the clock by that amount.
>>>>>>>>
>>>>>>>> In a VM with 5 seconds downtime, this reduces the guest
>>>>>>>> clock difference on destination from 5s to 0.2s.
>>>>>>>>
>>>>>>>> Tested with Linux and Windows 2012 R2 guests with -cpu XXX,+hv-time.
>>>>>>>
>>>>>>> One thing that bothers me is that it's only this clock that's
>>>>>>> getting corrected; doesn't it cause things to get upset when
>>>>>>> one clock moves and the others dont?
>>>>>>
>>>>>> If you are correlating the clocks, then yes.
>>>>>>
>>>>>> Older Linux guests get upset (marking the TSC clocksource unstable
>>>>>> because the watchdog checks TSC vs kvmclock), but there is a workaround for it 
>>>>>> in newer guests
>>>>>> (kvmclock interface to notify watchdog to not complain).
>>>>>>
>>>>>> Note marking TSC clocksource unstable on older guests is harmless
>>>>>> because kvmclock is the standard clocksource.
>>>>>>
>>>>>> For Windows guests, i don't know that Windows correlates between different
>>>>>> clocks.
>>>>>>
>>>>>> That is, there is relative control as to which software reads kvmclock 
>>>>>> or Windows TIMER MSR, so i don't see the need to advance every clock 
>>>>>> exposed.
>>>>>>
>>>>>>> Shouldn't the pause delay be recorded somewhere architecturally
>>>>>>> independent and then be a thing that kvm-clock happens to use and
>>>>>>> other clocks might as well?
>>>>>>
>>>>>> In theory, yes. In practice, i don't see the need for this... 
>>>>>
>>>>> It seems unlikely to me that x86 is the only one that will want
>>>>> to do something similar.
>>>>
>>>> Can't they copy what kvmclock is doing today? 
>>>
>>> We shouldn't have copies of code all over should we?
>>
>> Let's cross the bridge when we get there.
> 
> That will mean it has the migration data in the wrong place
> and any other clocks that need to be incremented by the same offset
> will need a hook or be inconsistent with this calculation.

No, there is no additional migration data that is needed.  This is just
a bug in how the pausing of CLOCK_MONOTONIC was implemented for the
kvmclock clocksource.

Right now, x86 is the only case where we have the problem, and x86 is
using a single "backend" for both kvmclock and the Hyper-V TSC reference
page.

For everyone else, there is no clocksource paravirtualization going on
(luckily, considering what a mess is kvmclock).  They can just use
QEMU_CLOCK_VIRTUAL if they want something that pauses during the VM.
Now, QEMU_CLOCK_VIRTUAL actually has the same bug that Marcelo is
fixing, so we may indeed want a common solution if possible.  But again,
let's see first what the code looks like for _one_ clocksource, before
writing a generalized (and thus more complex) solution.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html