On Mon, 2023-10-02 at 09:37 -0700, Sean Christopherson wrote: > On Mon, Oct 02, 2023, David Woodhouse wrote: > > On Fri, 2023-09-29 at 13:15 -0700, Dongli Zhang wrote: > > > > > > 1. The vcpu->hv_clock (kvmclock) is based on its own mult/shift/equation. > > > > > > 2. The raw monotonic (tsc_clocksource) uses different mult/shift/equation. > > > > > > > That just seems wrong. I don't mean that you're incorrect; it seems > > *morally* wrong. > > > > In a system with X86_FEATURE_CONSTANT_TSC, why would KVM choose to use > > a *different* mult/shift/equation (your #1) to convert TSC ticks to > > nanoseconds than the host CLOCK_MONOTONIC_RAW does (your #2). > > > > I understand that KVM can't track the host's CLOCK_MONOTONIC, as it's > > adjusted by NTP. But CLOCK_MONOTONIC_RAW is supposed to be consistent. > > > > Fix that, and the whole problem goes away, doesn't it? > > > > What am I missing here, that means we can't do that? > > I believe the answer is that "struct pvclock_vcpu_time_info" and its math are > ABI between KVM and KVM guests. > > Like many of the older bits of KVM, my guess is that KVM's behavior is the product > of making things kinda sorta work with old hardware, i.e. was probably the least > awful solution in the days before constant TSCs, but is completely nonsensical on > modern hardware. I still don't understand. The ABI and its math are fine. The ABI is just "at time X the TSC was Y, and the TSC frequency is Z" I understand why on older hardware, those values needed to *change* occasionally when TSC stupidity happened. But on newer hardware, surely we can set them precisely *once* when the VM starts, and never ever have to change them again? Theoretically not even when we pause the VM, kexec into a new kernel, and resume the VM! But we *are* having to change it, because apparently CLOCK_MONOTONIC_RAW is doing something *other* than incrementing at precisely the frequency of the known and constant TSC. But *why* is CLOCK_MONOTONIC_RAW doing that? I thought that the whole point of CLOCK_MONOTONIC_RAW was to be consistent and not adjusted by NTP etc.? Shouldn't it run at precisely the same rate as the kvmclock, with no skew at all? And if CLOCK_MONOTONIC_RAW is not what I thought it was... do we really have to keep resetting the kvmclock to it at all? On modern hardware can't the kvmclock be defined by the TSC alone?
Attachment:
smime.p7s
Description: S/MIME cryptographic signature