2016-01-07 00:41-0800, Andy Lutomirski: > On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> AFAICT KVM reliably passes a monotonic TSC through to guests, even if >> the host suspends. That's all that sched_clock needs, I think. >> >> So why does kvmclock have a custom sched_clock? If the host CPU has enough features, then yes, KVM can take care of everything and kvmclock has no advantage over TSC, even when migrating to TSC with different frequency as modern CPUs support TSC offset + scaling in guests. The problem is with antiques. Guests on old CPUs need to have more information on top of TSC to be able to get useful system time. And old KVM doesn't provide good information, so we have legacy layers everywhere. kvmclock in the guest can just equal to rdtsc() with modern CPUs, but we still want to use kvmclock wrapper, because kvmclock can provide an stable clock regardless of underlying TSC (in theory). >> On a related note, KVM doesn't pass the "invariant TSC" feature >> through to guests on my machine even though "invtsc" is set in QEMU >> and the kernel host code appears to support it. What gives? > > I think I solved part of the puzzle. KVM doesn't like to advertise > invtsc by default because that breaks migration. (Oddly, the end > result seems wrong -- with migration, the TSC doesn't stop, but it's > not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but > whatever.) QEMU probably missed that because X86_FEATURE_CONSTANT_TSC is a function of family/model. (CONSTANT_TSC is the same as invariant TSC as KVM guests don't have c-states.) > So the scheduler clock doesn't get marked stable. Stable sched clock is quite unrelated to TSC features. KVMs from last few years should always give good enough result to allow stable sched clock. We wanted realtime guests and realtime linux needs no_hz=full that depends on stable sched clock. The result is huge hack. We'd need to say that migration creates powerful gravity fields to faithfully migrate constant/invariant TSC, but stable sched clock doesn't have that strict expectations about time. > Is that it? > > This still doesn't explain why even explicitly trying to set invtsc > doesn't seem to work. Seems like a bug. Mine cpuid is 0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100 and QEMU says warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8] I'll see if it's in KVM or QEMU. (We should only forbid migrations to hosts with different frequency and without guest TSC scaling.) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html