On Wed, Mar 13, 2024, Wei W Wang wrote: > On Wednesday, March 13, 2024 8:38 AM, Mingwei Zhang wrote: > > Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read > > it after vCPU has already run. Previously, KVM will always return the guest > > cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The > > guest cached value on default is kvm_caps.supported_perf_cap. However, > > when userspace sets the value during live migration, the call fails because of > > the check on X86_FEATURE_PDCM. > > Could you point where in the set_msr path that could fail? > (I don’t find there is a check of X86_FEATURE_PDCM in vmx_set_msr and > kvm_set_msr_common) The changelog is misleading, it's not the PDCM feature bit, it's the PMU version check in vmx_set_msr(): case MSR_IA32_PERF_CAPABILITIES: if (data && !vcpu_to_pmu(vcpu)->version) return 1; > > Initially, it sounds like a pure userspace issue. It is not. After vCPU has run, > > KVM should faithfully return correct value to satisify legitimate requests from > > userspace such as VM suspend/resume and live migrartion. In this case, KVM > > should return 0 when guest cpuid lacks X86_FEATURE_PDCM. > Some typos above (satisfy, migration, CPUID) > > Seems the description here isn’t aligned to your code below? > The code below prevents userspace from reading the MSR value (not return 0 as the > read value) in that case. Ya. > >So fix the > > problem by adding an additional check in vmx_set_msr(). > > > > Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine > > because it set_msr() is guarded by kvm_caps.supported_perf_cap which is > > always 0. > > > > Cc: Aaron Lewis <aaronlewis@xxxxxxxxxx> > > Cc: Jim Mattson <jmattson@xxxxxxxxxx> > > Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx> > > --- > > arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index > > 40e3780d73ae..6d8667b56091 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -2049,6 +2049,17 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, > > struct msr_data *msr_info) > > msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash > > [msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0]; > > break; > > + case MSR_IA32_PERF_CAPABILITIES: > > + /* > > + * Host VMM should not get potentially invalid MSR value if > > vCPU > > + * has already run but guest cpuid lacks the support for the > > + * MSR. > > + */ > > + if (msr_info->host_initiated && > > + kvm_vcpu_has_run(vcpu) && > > + !guest_cpuid_has(vcpu, X86_FEATURE_PDCM)) > > + return 1; As Wei pointed out, this doesn't match the changelog. And I don't think this is what we want, at least not in isolation. Making KVM more restrictive on userspace reads doesn't solve the live migration save/restore issue, and the kvm_vcpu_has_run() adds yet another flavor of MSR handling. We discussed this whole MSRs mess at PUCK this morning. I forgot to hit RECORD, but Paolo took notes and will post them soon. Going from memory, the plan is to: 1. Commit to, and document, that userspace must do KVM_SET_CPUID{,2} prior to KVM_SET_MSRS. 2. Go with roughly what I proposed in the CET thread (for unsupported MSRS, read 0 and drop writes of '0')[*]. 3. Add a quire for PERF_CAPABILITIES, ARCH_CAPABILITIES, and PLATFORM_INFO (if quirk==enabled, keep KVM's current behavior; if quirk==disabled, zero- initialize the MSRs). With those pieces in place, KVM can simply check X86_FEATURE_PDCM for both reads and writes to PERF_CAPABILITIES, and the common userspace MSR handling will convert "unsupported" to "success" as appropriate. [*] https://lore.kernel.org/all/ZfDdS8rtVtyEr0UR@xxxxxxxxxx