2016-03-16 16:07-0700, Andy Lutomirski: > On Wed, Mar 16, 2016 at 3:59 PM, Radim Krcmar <rkrcmar@xxxxxxxxxx> wrote: >> 2016-03-16 15:15-0700, Andy Lutomirski: >>> FWIW, if you ever intend to support ART ("always running timer") >>> passthrough, this is going to be a giant clusterfsck. Good luck. I >>> haven't gotten a straight answer as to what hardware actually supports >>> that thing, so even testing isn't no easy. >> >> Hm, AR TSC would be best handled by doing nothing ... dropping the >> faking logic just became tempting. ART is different from what I initially thought, it's the underlying mechanism for invariant TSC and nothing more ... we already forbid migrations when the guest knows about invariant TSC, so we could do the same and let ART be virtualized. (Suspend has to be forbidden too.) > As it stands, ART is screwed if you adjust the VMCS's tsc offset. But Luckily, assigning real hardware can prevent migration or suspend, so we won't need to adjust the offset during runtime. TSC is a generally unmigratable device that just happens to live on the CPU. (It would have been better to hide TSC capability from the guest and only use rdtsc for kvmclock if the guest wanted fancy features.) > I think it's also screwed if you migrate to a machine with a different > ratio of guest TSC ticks to host ART ticks or a different offset, > because the host isn't going to do the rdmsr every time it tries to > access the ART, so passing it through might require a paravirt > mechanism no matter what. It's almost certain that the other host will have a different offset, which makes TSC unmigratable in software without even considering ART or frequencies. Well, KVM already emulates different TSC frequency, so we could emulate ART without sinking much lower. :) > ISTM that, if KVM tries to keep the guest TSC monotonic across > migration, it should probably also keep it monotonic across host > suspend/resume. Yes, "Pausing" TSC during suspend or migration is one way of improving the TSC estimate. If we want to emulate ART, then the estimate is noticeably lacking, because TSC and ART are defined by a simple equation (SDM 2015-12, 17.14.4 Invariant Time-Keeping): TSC_Value = (ART_Value * CPUID.15H:EBX[31:0] )/ CPUID.15H:EAX[31:0] + K where the guest thinks that CPUID and K are constant (between events that the guest knows of), so we should give the best estimate of how many TSC cycles have passed. (The best estimate is still lacking.) > After all, host suspend/resume is kind of like > migrating from the pre-suspend host to the post-resume host. Maybe it > could even share code. Hopefully ... host suspend/resume is driven by kernel and migration is driven by userspace, which might complicate sharing. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html