On Mar 17, 2016 8:10 AM, "Radim Krcmar" <rkrcmar@xxxxxxxxxx> wrote: > > 2016-03-16 16:07-0700, Andy Lutomirski: > > On Wed, Mar 16, 2016 at 3:59 PM, Radim Krcmar <rkrcmar@xxxxxxxxxx> wrote: > >> 2016-03-16 15:15-0700, Andy Lutomirski: > >>> FWIW, if you ever intend to support ART ("always running timer") > >>> passthrough, this is going to be a giant clusterfsck. Good luck. I > >>> haven't gotten a straight answer as to what hardware actually supports > >>> that thing, so even testing isn't no easy. > >> > >> Hm, AR TSC would be best handled by doing nothing ... dropping the > >> faking logic just became tempting. > > ART is different from what I initially thought, it's the underlying > mechanism for invariant TSC and nothing more ... we already forbid > migrations when the guest knows about invariant TSC, so we could do the > same and let ART be virtualized. (Suspend has to be forbidden too.) It's more than that -- it's a TSC-like clock that can be read by PCIe devices. > > > As it stands, ART is screwed if you adjust the VMCS's tsc offset. But > > Luckily, assigning real hardware can prevent migration or suspend, so we > won't need to adjust the offset during runtime. TSC is a generally > unmigratable device that just happens to live on the CPU. > > (It would have been better to hide TSC capability from the guest and only > use rdtsc for kvmclock if the guest wanted fancy features.) > I think that, if KVM passes through an ART-supporting NIC, it might be rather messy to try to avoid passing through TSC as well. But maybe a pvclock-like structure could expose the ART-kvmclock offset and scale. > > I think it's also screwed if you migrate to a machine with a different > > ratio of guest TSC ticks to host ART ticks or a different offset, > > because the host isn't going to do the rdmsr every time it tries to > > access the ART, so passing it through might require a paravirt > > mechanism no matter what. > > It's almost certain that the other host will have a different offset, > which makes TSC unmigratable in software without even considering ART > or frequencies. Well, KVM already emulates different TSC frequency, so > we could emulate ART without sinking much lower. :) > > > ISTM that, if KVM tries to keep the guest TSC monotonic across > > migration, it should probably also keep it monotonic across host > > suspend/resume. > > Yes, "Pausing" TSC during suspend or migration is one way of improving > the TSC estimate. If we want to emulate ART, then the estimate is > noticeably lacking, because TSC and ART are defined by a simple > equation (SDM 2015-12, 17.14.4 Invariant Time-Keeping): > TSC_Value = (ART_Value * CPUID.15H:EBX[31:0] )/ CPUID.15H:EAX[31:0] + K > > where the guest thinks that CPUID and K are constant (between events > that the guest knows of), so we should give the best estimate of how > many TSC cycles have passed. (The best estimate is still lacking.) > > > After all, host suspend/resume is kind of like > > migrating from the pre-suspend host to the post-resume host. Maybe it > > could even share code. > > Hopefully ... host suspend/resume is driven by kernel and migration is > driven by userspace, which might complicate sharing. Good point. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html