On Wed, Jun 22, 2022, Paul Durrant wrote: > The scaling information in sub-leaf 1 should match the values in the > 'vcpu_info' sub-structure 'time_info' (a.k.a. pvclock_vcpu_time_info) which > is shared with the guest. The offset values are not set since a TSC offset > is already applied. > The host TSC frequency should also be set in sub-leaf 2. Explain why this is KVM's problem, i.e. why userspace is unable to set the correct values. > This patch adds a new kvm_xen_set_cpuid() function that scans for the Please avoid "This patch". > relevant CPUID leaf when the CPUID information is updated by the VMM and > stashes pointers to the sub-leaves in the kvm_vcpu_xen structure. > The values are then updated by a call to the, also new, > kvm_xen_setup_tsc_info() function made at the end of > kvm_guest_time_update() just before entering the guest. This is not a helpful paragraph, it provides zero information that isn't obvious from the code. The changelog should read something like: Update Xen CPUID leaves that expose TSC frequency and scaling information to the guest <blah blah blah>. Cache the leaves <blah blah blah>. > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx> > --- > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/cpuid.c | 2 ++ > arch/x86/kvm/x86.c | 1 + > arch/x86/kvm/xen.c | 41 +++++++++++++++++++++++++++++++++ > arch/x86/kvm/xen.h | 10 ++++++++ > 5 files changed, 56 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 1038ccb7056a..f77a4940542f 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -638,6 +638,8 @@ struct kvm_vcpu_xen { > struct hrtimer timer; > int poll_evtchn; > struct timer_list poll_timer; > + struct kvm_cpuid_entry2 *tsc_info_1; > + struct kvm_cpuid_entry2 *tsc_info_2; > }; > > struct kvm_vcpu_arch { > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index d47222ab8e6e..eb6cd88c974a 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -25,6 +25,7 @@ > #include "mmu.h" > #include "trace.h" > #include "pmu.h" > +#include "xen.h" > > /* > * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be > @@ -310,6 +311,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > __cr4_reserved_bits(guest_cpuid_has, vcpu); > > kvm_hv_set_cpuid(vcpu); > + kvm_xen_set_cpuid(vcpu); > > /* Invoke the vendor callback only after the above state is updated. */ > static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 00e23dc518e0..8b45f9975e45 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3123,6 +3123,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > if (vcpu->xen.vcpu_time_info_cache.active) > kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0); > kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock); > + kvm_xen_setup_tsc_info(v); This can be called inside this if statement, no? if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) { } > return 0; > } > > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c > index 610beba35907..a016ff85264d 100644 > --- a/arch/x86/kvm/xen.c > +++ b/arch/x86/kvm/xen.c > @@ -10,6 +10,9 @@ > #include "xen.h" > #include "hyperv.h" > #include "lapic.h" > +#include "cpuid.h" > + > +#include <asm/xen/cpuid.h> > > #include <linux/eventfd.h> > #include <linux/kvm_host.h> > @@ -1855,3 +1858,41 @@ void kvm_xen_destroy_vm(struct kvm *kvm) > if (kvm->arch.xen_hvm_config.msr) > static_branch_slow_dec_deferred(&kvm_xen_enabled); > } > + > +void kvm_xen_set_cpuid(struct kvm_vcpu *vcpu) This is a very, very misleading name. It does not "set" anything. Given that this patch adds "set" and "setup", I expected the "set" to you know, set the CPUID leaves and the "setup" to prepar for that, not the other way around. If the leaves really do need to be cached, kvm_xen_after_set_cpuid() is probably the least awful name. > +{ > + u32 base = 0; > + u32 function; > + > + for_each_possible_hypervisor_cpuid_base(function) { > + struct kvm_cpuid_entry2 *entry = kvm_find_cpuid_entry(vcpu, function, 0); > + > + if (entry && > + entry->ebx == XEN_CPUID_SIGNATURE_EBX && > + entry->ecx == XEN_CPUID_SIGNATURE_ECX && > + entry->edx == XEN_CPUID_SIGNATURE_EDX) { > + base = function; > + break; > + } > + } > + if (!base) > + return; > + > + function = base | XEN_CPUID_LEAF(3); > + vcpu->arch.xen.tsc_info_1 = kvm_find_cpuid_entry(vcpu, function, 1); > + vcpu->arch.xen.tsc_info_2 = kvm_find_cpuid_entry(vcpu, function, 2); Is it really necessary to cache the leave? Guest CPUID isn't optimized, but it's not _that_ slow, and unless I'm missing something updating the TSC frequency and scaling info should be uncommon, i.e. not performance critical.