Re: invtsc + migration + TSC scaling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 19, 2016 at 05:42:16PM +0200, Radim Krčmář wrote:
> 2016-10-19 11:55-0200, Eduardo Habkost:
> > On Wed, Oct 19, 2016 at 03:27:52PM +0200, Radim Krčmář wrote:
> >> 2016-10-18 19:05-0200, Eduardo Habkost:
> >> > On Tue, Oct 18, 2016 at 10:52:14PM +0200, Radim Krčmář wrote:
> >> > [...]
> >> >> The main problem is that QEMU changes virtual_tsc_khz when migrating
> >> >> without hardware scaling, so KVM is forced to get nanoseconds wrong ...
> >> >> 
> >> >> If QEMU doesn't want to keep the TSC frequency constant, then it would
> >> >> be better if it didn't expose TSC in CPUID -- guest would just use
> >> >> kvmclock without being tempted by direct TSC accesses.
> >> > 
> >> > Isn't enough to simply not expose invtsc? Aren't guests expected
> >> > to assume the TSC frequency can change if invtsc isn't set on
> >> > CPUID?
> >> 
> >> There are exceptions.  An OS can assume constant TSC on some models that
> >> QEMU emulates: coreduo, core2duo, Conroe, Penryn, n270, kvm32 and kvm64.
> >> The list from SDM (17.15 TIME-STAMP COUNTER):
> >> 
> >>   Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H
> >>   and higher]); Intel Core Solo and Intel Core Duo processors (family
> >>   [06H], model [0EH]); the Intel Xeon processor 5100 series and Intel
> >>   Core 2 Duo processors (family [06H], model [0FH]); Intel Core 2 and
> >>   Intel Xeon processors (family [06H], DisplayModel [17H]); Intel Atom
> >>   processors (family [06H], DisplayModel [1CH]))
> >> 
> >> Another sad part is that Linux uses the following condition to assume
> >> constant TSC frequency:
> >> 
> >>   	if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
> >>   		(c->x86 == 0x6 && c->x86_model >= 0x0e))
> >>   		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
> >> 
> >> which returns sets constant TSC for all modern processors.  It's not a
> >> problem on real hardware, because all modern processors likely have
> >> invariant TSC.
> >> 
> >> Fun fact: Linux shows constant_tsc flag in /proc/cpuinfo even if the
> >>           modern CPU doesn't expose TSC in CPUID.
> >> 
> >> Considering that Linux is fixed on Nehalem and newer processors, we have
> >> few options for the rest:
> >>  1) treat TSC like invariant TSC on those models (the guest cannot use
> >>     ACPI state, so its OS might assume that they are equivalent)
> >>  2) hide TSC on those models
> >>  3) ignore the problem
> >>  4) remove those models
> >> 
> >> I don't know enough about QEMU design goals to guess which one is the
> >> most appropriate.  (4) is the clear winner for me, followed by (3). :)
> > 
> > (4) can't be implemented because it breaks existing
> > configurations. (3) is the current solution.
> 
> Existing machine types must remain compatible, but isn't it possible to
> cull options in new machine types?

We specifically promised to libvirt developers that a CPU model
that can be started with a machine-type should be still runnable
with other versions of the same machine-type family. In other
words, a running config should keep working if only the
machine-type version changed.

> 
> > Option (2) sounds attractive to me, but seems risky.
> 
> Definitely.
> If users have a setup that works, then any change can break it.
> 
> It would be the best option few years back when we wrote the code, but
> now the change will happen *in* the guest, so we can't control it as in
> the case of (4), where broken guests won't start, or (1), where broken
> guests won't migrate.
> 
> >                                                      I would like
> > to understand the consequences for guests. What could stop
> > working if we remove TSC? What about kvmclock?
> 
> Hiding TSC in CPUID doesn't disable the RDTSC instruction in the guest.
> 
> kvmclock is a paravirtual device on top of TSC, so if kvmclock is
> present, then it should be safe to assume that the guest can use TSC for
> operations with kvmclock.
> Linux does that, but I don't think this behavior was ever written down,
> so other kvmclock users could break.
> 
> Maybe Hyper-V TSC page would stop working, because Windows and other
> users could have a check for CPUID.1:EDX.TSC separately.
> Linux's implemention would work, because it just checks for the
> paravirtual feature, like in case of kvmclock.
> 
> And minor cases are: an OS that has no other option that TSC for clock;
> userspace that checks TSC before using it; an OS that stops setting
> CR4.TSD and its userspace starts to use TSC; and probably many others.

OK, that sounds very risky. This means it is probably better to
let management software explicitly choose the new stricter
behavior.

...and we already have a mechanism to request stricter behavior:
explicitly disabling TSC, or setting tsc-frequency explicitly on
the command-line.

> 
> > If we implement (2), we could even add an extra check that blocks
> > migration (or at least prints a warning) in case:
> > 1) TSC is forcibly enabled in the configuration;
> > 2) TSC scaling is not available on destination; and
> > 3) the family/model values match the ones on the list above.
> > 
> > And we could even keep TSC enabled by default for users who don't
> > want migration (using migratable=false).
> 
> That would be nice.

We already print a warning if there's TSC frequency mismatch
without TSC scaling. I wonder if we should reduce false positives
by printing it only when family/model is on the list above (or if
invtsc is enabled).

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux