On Tue, 2020-12-08 at 17:40 +0100, Thomas Gleixner wrote: > On Tue, Dec 08 2020 at 13:13, Maxim Levitsky wrote: > > On Mon, 2020-12-07 at 11:29 -0600, Oliver Upton wrote: > > > How would a VMM maintain the phase relationship between guest TSCs > > > using these ioctls? > > > > By using the nanosecond timestamp. > > > > While I did made it optional in the V2 it was done for the sole sake of being > > able to set TSC on (re)boot to 0 from qemu, and for cases when qemu migrates > > from a VM where the feature is not enabled. > > In this case the tsc is set to the given value exactly, just like you > > can do today with KVM_SET_MSRS. > > In all other cases the nanosecond timestamp will be given. > > > > When the userspace uses the nanosecond timestamp, the phase relationship > > would not only be maintained but be exact, even if TSC reads were not > > synchronized and even if their restore on the target wasn't synchronized as well. > > > > Here is an example: > > > > Let's assume that TSC on source/target is synchronized, and that the guest TSC > > is synchronized as well. > > > > Let's call the guest TSC frequency F (guest TSC increments by F each second) > > > > We do KVM_GET_TSC_STATE on vcpu0 and receive (t0,tsc0). > > We do KVM_GET_TSC_STATE on vcpu1 after 1 second passed (exaggerated) > > and receive (t0 + 1s, tsc0 + F) > > Why? > > You freeeze the VM and store the realtime timestamp of doing that. At > that point assuming a full sync host system the only interesting thing > to store is the guest offset which is the same on all vCPUs and it is > known already. > > So on restore the only thing which needs to be adjusted is the guest > wide offset. > > newoffset = oldoffset + (now - tfreeze) > > Then set newoffset for all vCPUs. Anything else is complexity for no > value and bound to fall apart in hard to debug ways. > > The offset is still the same for all vCPUs whether you can restore them > in the same nanosecond or whether you need 3 minutes for each one. It > does not matter because when you restore vCPU1 3 minutes after vCPU0 > then TSC has advanced 3 minutes as well. It's still correct from the > guest POV. > > Even if you support TSCADJUST and let the guest write to it does not > change the per guest offset at all. TSCADJUST is per [v]CPU and adds on > top: > > tscvcpu = tsc_host + guest_offset + TSC_ADJUST > > Scaling is just orthogonal and does not change any of this. I agree with this, and I think that this is what we will end up doing. Paulo, what do you think about this? Best regards, Maxim Levitsky > > Thanks, > > tglx >