On Wed, Mar 16, 2016 at 3:59 PM, Radim Krcmar <rkrcmar@xxxxxxxxxx> wrote: > 2016-03-16 15:15-0700, Andy Lutomirski: >> On Wed, Mar 16, 2016 at 3:06 PM, Radim Krcmar <rkrcmar@xxxxxxxxxx> wrote: >>> Guest TSC is going to jump backward with this patch, which would make >>> the guest think that a lot of cycles passed. This has no bearing on >>> guest timekeeping, because the guest shouldn't be using raw TSC. >>> If we wanted to do something though, there are at least two options: >>> 1) Fake that TSC continued at roughly its specified rate: compute how >>> many cycles could have elapsed while the CPU was suspended (using >>> host time before/after suspend and guest TSC frequency) and adjust >>> guest TSC. >>> 2) Resume guest TSC at its last cycle before suspend. >>> (Roughly what KVM does now.) >>> >>> What are your opinions on TSC faking? >> >> I'd suggest restarting it wherever it left off, because it's simpler. >> If there was a CLOCK_BOOT_RAW, you could try to track it, but I'm not >> sure that such a thing exists. > > CLOCK_MONOTONIC_RAW can count in suspend, so CLOCK_BOOT_RAW would be a > conditional alias and it probably doesn't exist because of that. > >> FWIW, if you ever intend to support ART ("always running timer") >> passthrough, this is going to be a giant clusterfsck. Good luck. I >> haven't gotten a straight answer as to what hardware actually supports >> that thing, so even testing isn't no easy. > > Hm, AR TSC would be best handled by doing nothing ... dropping the > faking logic just became tempting. As it stands, ART is screwed if you adjust the VMCS's tsc offset. But I think it's also screwed if you migrate to a machine with a different ratio of guest TSC ticks to host ART ticks or a different offset, because the host isn't going to do the rdmsr every time it tries to access the ART, so passing it through might require a paravirt mechanism no matter what. ISTM that, if KVM tries to keep the guest TSC monotonic across migration, it should probably also keep it monotonic across host suspend/resume. After all, host suspend/resume is kind of like migrating from the pre-suspend host to the post-resume host. Maybe it could even share code. > >>> --- >>> Btw. I'll be spending some days to decipher kvmclock, so I'd also fix >>> the masterclock+suspend issue, if you don't mind ... So far, I don't >>> even see a reason to update kvmclock on kvm_arch_hardware_enable(). >>> Suspend is a condition that we want to handle, so kvm_resume would be a >>> better place, but we handle suspend only because TSC and timekeeping has >>> changed, so I think that the right place is in their event notifiers. >> >> I'd be glad to try to review things. Please cc me. > > Ok. > >> One of the Xen people pointed me at the MS Viridian spec for handling >> TSC rate changes on migration to or from hosts that don't support TSC >> scaling. I wonder if KVM could use the same technique or even the >> same API. > > The TSC frequency MSR is read-only in Xen, so I guess it's equivalent to > pvclock. I'll take a deeper look, thanks for pointers. -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html