On Thu, Jan 03, 2019 at 05:18:49PM -0200, Marcelo Tosatti wrote: > > The invariant TSC bit has the following meaning: > > "The time stamp counter in newer processors may support an enhancement, > referred to as invariant TSC. Processor's support for invariant TSC > is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run > at a constant rate in all ACPI P-, C-. and T-states. This is the > architectural behavior moving forward. On processors with invariant TSC > support, the OS may use the TSC for wall clock timer services (instead > of ACPI or HPET timers). TSC reads are much more efficient and do not > incur the overhead associated with a ring transition or access to a > platform resource." > > IOW, TSC does not change frequency. In such case, and with > TSC scaling hardware available to handle migration, it is possible > to use the TSC clocksource directly, whose system calls are > faster. > > Reduce the rating of kvmclock clocksource to allow TSC clocksource > to be the default if invariant TSC is exposed. > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index 30084ec..575857c 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -35,6 +35,7 @@ > #include <asm/x86_init.h> > #include <asm/reboot.h> > #include <asm/kvmclock.h> > +#include <asm/processor.h> > > static int kvmclock __initdata = 1; > static int kvmclock_vsyscall __initdata = 1; > @@ -325,6 +326,7 @@ static int kvmclock_setup_percpu(unsigned int cpu) > void __init kvmclock_init(void) > { > u8 flags; > + struct cpuinfo_x86 *c = &boot_cpu_data; > > if (!kvm_para_available() || !kvmclock) > return; > @@ -368,6 +370,18 @@ void __init kvmclock_init(void) > machine_ops.crash_shutdown = kvm_crash_shutdown; > #endif > kvm_get_preset_lpj(); > + > + /* > + * c->x86_power is 8000_0007 edx. Bit 8 is TSC runs at constant rate > + * with P/T states and does not stop in deep C-states. > + * > + * Invariant TSC exposed by host means kvmclock is not necessary: > + * can use TSC as clocksource. > + * > + */ > + if (c->x86_power & (1 << 8)) > + kvm_clock.rating = 299; Why check x86_power manually instead of using the feature bits that all x86 CPUs set based on x86_power? And it seems like this should check tsc_unstable as well, e.g. to account for the user explicitly disabling the TSC via kernel params. I.e.: if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && !check_tsc_unstable()) kvm_clock.rating = 299; Note that I have no idea if there are other changes necessary to promote the TSC above kvmclock, i.e. someone with more clock knowledge should review this :) > + > clocksource_register_hz(&kvm_clock, NSEC_PER_SEC); > pv_info.name = "KVM"; > } > > > >