On 04/18/2010 02:21 AM, Espen Berg wrote:
Den 17.04.2010 22:17, skrev Michael Tokarev:
We have three KVM hosts that supports live-migration between them, but
one of our problems is time drifting. The three frontends has different
CPU frequency and the KVM guests adopt the frequency from the host
machine where it was first started.
What do you mean by "adopts" ? Note that the cpu frequency
means nothing for all the modern operating systems, at least
since the days of common usage of MS-DOS which relied on CPU
frequency for its time functions. All interesting things are
now done using timers instead, and timers (which don't depend
on CPU frequency again) usually work quite well.
The assumption that frequency of the ticks was calculated by the hosts
MHz, was based on the fact that grater clock frequency differences
caused higher time drift. 60 MHz difference caused about 24min drift,
332 MHz difference caused about 2h25min drift.
What complicates things is that the most cheap and accurate
enough time source is TSC (time stamp counter register in
the CPU), but it will definitely be different on each
machine. For that, 0.12.3 kvm and 2.6.32 kernel (I think)
introduced a compensation. See for example -tdf kvm option.
Ah, nice to know. :)
That's two different things here:
The issue that Espen is reporting is that the hosts have different
frequency and guests that relay on the tsc as a source clock will notice
that post migration. The is indeed a problem that -tdf does not solve.
-tdf only adds compensation for the RTC clock emulation.
What's the guest type and what's the guest's source clock?
Using tsc directly as a source clock is not recommended because of this
migration issue (that is not solveable until we trap every rdtsc by the
guest). Using pv kvmclock in Linux mitigates this issue since it exposes
both the tsc and the host clock so guests can adjust themselves.
Several months ago a pvclock migration fix was added to pass the pvclock
MSRs reading to the destination: 1a03675db146dfc760b3b48b3448075189f142cc
Since this is a cluster in production, I'm not able to try the latest
version either.
Well, that's difficult one, no? It either works or not.
If you can't try anything else, why to ask? :)
What I tried to say was that there are many important virtual servers
running on this cluster at the moment, so "trial by error" was not an
option. The last time we tried 0.12.x (during the initial tests of the
cluster) there where a lot of stability issues, crashes during migration
etc.
Regards, Espen
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html