On Wed, 2023-09-13 at 18:37 +0800, Like Xu wrote: > From: Like Xu <likexu@xxxxxxxxxxx> > > The legacy API for setting the TSC is fundamentally broken, and only > allows userspace to set a TSC "now", without any way to account for > time lost to preemption between the calculation of the value, and the > kernel eventually handling the ioctl. > > To work around this we have had a hack which, if a TSC is set with a > value which is within a second's worth of a previous vCPU, assumes that > userspace actually intended them to be in sync and adjusts the newly- > written TSC value accordingly. > > Thus, when a VMM restores a guest after suspend or migration using the > legacy API, the TSCs aren't necessarily *right*, but at least they're > in sync. > > This trick falls down when restoring a guest which genuinely has been > running for less time than the 1 second of imprecision which we allow > for in the legacy API. On *creation* the first vCPU starts its TSC > counting from zero, and the subsequent vCPUs synchronize to that. But > then when the VMM tries to set the intended TSC value, because that's > within a second of what the last TSC synced to, it just adjusts it to > match that. > Proofreading my own words here... "it just adjusts it to match" is using the same pronoun for different things and is probably hard to follow. Perhaps "KVM just adjusts it to match" is nicer. > The correct answer is for the VMM not to use the legacy API of course. > > But we can pile further hacks onto our existing hackish ABI, and > declare that the *first* value written by userspace (on any vCPU) > should not be subject to this 'correction' to make it sync up with > values that only from the kernel's default vCPU creation. ^^ ... that only *come* from the kernel's... > > To that end: Add a flag in kvm->arch.user_set_tsc, protected by > kvm->arch.tsc_write_lock, to record that a TSC for at least one vCPU in > this KVM *has* been set by userspace. Make the 1-second slop hack only > trigger if that flag is already set. > > Reported-by: Yong He <alexyonghe@xxxxxxxxxxx> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217423 > Suggested-by: Oliver Upton <oliver.upton@xxxxxxxxx> > Original-by: Oliver Upton <oliver.upton@xxxxxxxxx> > Original-by: Sean Christopherson <seanjc@xxxxxxxxxx> > Co-developed-by: David Woodhouse <dwmw2@xxxxxxxxxxxxx> > Signed-off-by: David Woodhouse <dwmw2@xxxxxxxxxxxxx> > Signed-off-by: Like Xu <likexu@xxxxxxxxxxx> > Tested-by: Yong He <alexyonghe@xxxxxxxxxxx> Reviewed-by: David Woodhouse <dwmw@xxxxxxxxxxxx> Please remove the 'Signed-off-by' from me. You must never ever *type* a signed-off-by line for anyone else. You only ever cut and paste those intact when they have provided them for *themselves*. It's OK to remove the Co-developed-by: too. You did the actual typing of the code here; I just heckled :)
Attachment:
smime.p7s
Description: S/MIME cryptographic signature