On Fri, 2023-08-11 at 15:59 -0700, Sean Christopherson wrote: > The problem isn't that the sync code doesn't differentiate between kernel and > user-initiated writes, because parts of the code *do* differentiate. I think it's > more accurate to say that the problem is that the sync code doesn't differentiate > between userspace initializing the TSC and userspace attempting to synchronize the > TSC. I'm not utterly sure that *I* differentiate between userspace "initializing the TSC" and attempting to "synchronize the TSC". What *is* the difference? Userspace is merely *setting* the TSC for a given vCPU, regardless of whether other vCPUs even exist. But we have to work around the fundamental brokenness of the legacy API, whose semantics are most accurately described as "Please set the TSC to precisely <x> because that's what it should have been *some* time around now, if I wasn't preempted very much between when I calculated it and when you see this ioctl". That's why — for the legacy API only — we have this hack to make the TSCs *actually* in sync if they're close. Because without it, there;s *no* way the VMM can restore a guest with its TSCs actually in sync. I think the best answer to the bug report that led to this patch is just "Don't use the legacy API then". Use KVM_VCPU_TSC_OFFSET which is defined as "the TSC was <x> at KVM time <y>" and is actually *sane*.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature