On Fri, Dec 11, 2020 at 10:59:59PM +0100, Paolo Bonzini wrote: > On 11/12/20 22:04, Thomas Gleixner wrote: > > > Its 100ms off with migration, and can be reduced further (customers > > > complained about 5 seconds but seem happy with 0.1ms). > > What is 100ms? Guaranteed maximum migration time? > > I suppose it's the length between the time from KVM_GET_CLOCK and > KVM_GET_MSR(IA32_TSC) to KVM_SET_CLOCK and KVM_SET_MSR(IA32_TSC). But the > VM is paused for much longer, the sequence for the non-live part of the > migration (aka brownout) is as follows: > > pause > finish sending RAM receive RAM ~1 sec > send paused-VM state finish receiving RAM \ > receive paused-VM state ) 0.1 sec > restart / > > The nanosecond and TSC times are sent as part of the paused-VM state at the > very end of the live migration process. > > So it's still true that the time advances during live migration brownout; > 0.1 seconds is just the final part of the live migration process. But for > _live_ migration there is no need to design things according to "people are > happy if their clock is off by 0.1 seconds only". Agree. What would be a good way to fix this? It seems to me using CLOCK_REALTIME as in the interface Maxim is proposing is prone to difference in CLOCK_REALTIME itself. Perhaps there is another way to measure that 0.1 sec which is independent of the clock values of the source and destination hosts (say by sending a packet once the clock stops counting). Then on destination measure delta = clock_restart_time - packet_receival and increase clock by that amount. > Again, save-to-disk, > reverse debugging and the like are a different story, which is why KVM > should delegate policy to userspace (while documenting how to do it right). > > Paolo > > > CLOCK_REALTIME and CLOCK_TAI are off by the time the VM is paused and > > this state persists up to the point where NTP corrects it with a time > > jump. > > > > So if migration takes 5 seconds then CLOCK_REALTIME is not off by 100ms > > it's off by 5 seconds. > > > > CLOCK_MONOTONIC/BOOTTIME might be off by 100ms between pause and resume. > >