On Wed, Jan 24, 2024, Mingwei Zhang wrote: > Reset vcpu->arch.perf_capabilities to 0 if PDCM is disabled in guest cpuid. > Without this, there is an issue in live migration. In particular, to > migrate a VM with no PDCM enabled, VMM on the source is able to retrieve a > non-zero value by reading the MSR_IA32_PERF_CAPABILITIES. However, VMM on > the target is unable to set the value. This creates confusions on the user > side. > > Fundamentally, it is because vcpu->arch.perf_capabilities as the cached > value of MSR_IA32_PERF_CAPABILITIES is incorrect, and there is nothing > wrong on the kvm_get_msr_common() which just reads > vcpu->arch.perf_capabilities. > > Fix the issue by adding the reset code in kvm_vcpu_after_set_cpuid(), i.e. > early in VM setup time. > > Cc: Aaron Lewis <aaronlewis@xxxxxxxxxx> > Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx> > --- > arch/x86/kvm/cpuid.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index adba49afb5fe..416bee03c42a 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -369,6 +369,9 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); > vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu); > > + /* Reset MSR_IA32_PERF_CAPABILITIES guest value to 0 if PDCM is off. */ > + if (!guest_cpuid_has(vcpu, X86_FEATURE_PDCM)) > + vcpu->arch.perf_capabilities = 0; No, this is just papering over the underlying bug. KVM shouldn't be stuffing vcpu->arch.perf_capabilities without explicit writes from host userspace. E.g KVM_SET_CPUID{,2} is allowed multiple times, at which point KVM could clobber a host userspace write to MSR_IA32_PERF_CAPABILITIES. It's unlikely any userspace actually does something like that, but KVM overwriting guest state is almost never a good thing. I've been meaning to send a patch for a long time (IIRC, Aaron also ran into this?). KVM needs to simply not stuff vcpu->arch.perf_capabilities. I believe we are already fudging around this in our internal kernels, so I don't think there's a need to carry a hack-a-fix for the destination kernel. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 27e23714e960..fdef9d706d61 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12116,7 +12116,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) kvm_async_pf_hash_reset(vcpu); - vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap; kvm_pmu_init(vcpu); vcpu->arch.pending_external_vector = -1; > kvm_pmu_refresh(vcpu); > vcpu->arch.cr4_guest_rsvd_bits = > __cr4_reserved_bits(guest_cpuid_has, vcpu); > -- > 2.43.0.429.g432eaa2c6b-goog >