Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Thu, 10 Dec 2020 07:16:10 -0800

> On Dec 10, 2020, at 6:52 AM, Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote:
> 
> On Thu, 2020-12-10 at 12:48 +0100, Paolo Bonzini wrote:
>>> On 08/12/20 22:20, Thomas Gleixner wrote:
>>> So now life migration comes a long time after timekeeping had set the
>>> limits and just because it's virt it expects that everything works and it
>>> just can ignore these limits.
>>> 
>>> TBH. That's not any different than SMM or hard/firmware taking the
>>> machine out for lunch. It's exactly the same: It's broken.
>> 
>> I agree.  If *live* migration stops the VM for 200 seconds, it's broken.
>> 
>> Sure, there's the case of snapshotting the VM over the weekend.  My 
>> favorite solution would be to just put it in S3 before doing that.  *Do 
>> what bare metal does* and you can't go that wrong.
> 
> Note though that qemu has a couple of issues with s3, and it is disabled 
> by default in libvirt. 
> I would be very happy to work on improving this if there is a need for that.

There’s also the case where someone has a VM running on a laptop and someone closes the lid. The host QEMU might not have a chance to convince the guest to enter S3.

> 
> 
>> 
>> In general it's userspace policy whether to keep the TSC value the same 
>> across live migration.  There's pros and cons to both approaches, so KVM 
>> should provide the functionality to keep the TSC running (which the 
>> guest will see as a very long, but not extreme SMI), and this is what 
>> this series does.  Maxim will change it to operate per-VM.  Thanks 
>> Thomas, Oliver and everyone else for the input.
> 
> I agree with that.
> 
> I still think though that we should have a discussion on feasibility
> of making the kernel time code deal with large *forward* tsc jumps 
> without crashing.
> 
> If that is indeed hard to do, or will cause performance issues,
> then I agree that we might indeed inform the guest of time jumps instead.
> 

Tglx, even without fancy shared host/guest timekeeping, count the guest kernel manage to update its timekeeping if the host sent the guest an interrupt or NMI on all CPUs synchronously on resume?

Alternatively, if we had the explicit “max TSC value that makes sense right now” in the timekeeping data, the guest would reliably notice the large jump and could at least do something intelligent about it instead of overflowing its internal calculation.