On Mon, Oct 17, 2016 at 04:50:09PM +0200, Radim Krčmář wrote: > 2016-10-17 07:47-0200, Marcelo Tosatti: > > On Fri, Oct 14, 2016 at 06:20:31PM -0300, Eduardo Habkost wrote: > >> I have been wondering: should we allow live migration with the > >> invtsc flag enabled, if TSC scaling is available on the > >> destination? > > > > TSC scaling and invtsc flag, yes. > > Yes, if we have well synchronized time between hosts, then we might be > able to migrate with a TSC shift that cannot be perceived by the guest. Even if the guest can't detect the TSC difference (relative to realtime), i suppose TSC should be advanced to account for the migration stopped time (so that TSC appears to have incremented at a "constant rate"). > Unless the VM also has a migratable assigned PCI device that uses ART, > because we have no protocol to update the setting of ART (in CPUID), so > we should keep migration forbidden then. What is the use case for ART again? (need to catchup on that). > > >> For reference, this is what the Intel SDM says about invtsc: > >> > >> The time stamp counter in newer processors may support an > >> enhancement, referred to as invariant TSC. Processor’s support > >> for invariant TSC is indicated by CPUID.80000007H:EDX[8]. > >> > >> The invariant TSC will run at a constant rate in all ACPI P-, > >> C-. and T-states. This is the architectural behavior moving > >> forward. On processors with invariant TSC support, the OS may > >> use the TSC for wall clock timer services (instead of ACPI or > >> HPET timers). TSC reads are much more efficient and do not > >> incur the overhead associated with a ring transition or access > >> to a platform resource. > > > > Yes. The blockage happened for different reasons: > > > > 1) Migration: to host with different TSC frequency. > > We shouldn't have done this even now when emulating anything newer than > Pentium 4, because those CPUs have constant TSC, which only lacks the > guarantee that it doesn't stop in deep C-states: > > For [a list of processors we emulate]: the time-stamp counter > increments at a constant rate. That rate may be set by the maximum > core-clock to bus-clock ratio of the processor or may be set by the > maximum resolved frequency at which the processor is booted. The > maximum resolved frequency may differ from the processor base > frequency, see Section 18.18.2 for more detail. On certain processors, > the TSC frequency may not be the same as the frequency in the brand > string. > > The specific processor configuration determines the behavior. Constant > TSC behavior ensures that the duration of each clock tick is uniform > and supports the use of the TSC as a wall clock timer even if the > processor core changes frequency. This is the architectural behavior > moving forward. > > Invariant TSC is more useful, though, so more applications would break > when migrating to a different TSC frequency. > > > 2) Savevm: It is not safe to use the TSC for wall clock timer > > services. > > With constant TSC, we could argue that a shift to deep C-state happened > and paused TSC, which is not a good behavior, but somewhat defensible. > > > By allowing savevm, you make a commitment to allow a feature > > at the expense of not complying with the spec (specifically the " > > the OS may use the TSC for wall clock timer services", because the > > TSC stops relative to realtime for the duration of the savevm stop > > window). > > Yep, we should at least guesstimate the TSC to allow the guest to resume > with as small TSC-shift as possible and check that hosts were somewhat > synchronized with UTC (or something we choose for time). There are two options for savevm: Option 1) Stop the TSC for savevm duration. Option 2) Advance TSC to match realtime (this is known to overflow Linux timekeeping though). > > > But since Linux guests use kvmclock and Windows guests use Hyper-V > > enlightenment, it should be fine to disable 2). > > > > There is a bug open for this, btw: > > https://bugzilla.redhat.com/show_bug.cgi?id=1353073 > > These people should be happy with just live-migrations, so can't we just > keep savevm forbidden? Don't see why. Perhaps savevm should be considered a "special type of operation" that deviates from baremetal behaviour and that if the user does savevm, then it knows TSC does not count "at a constant rate" (so savevm breaks invariant tsc behaviour). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html