Hi Maksim, On 11/8/24 6:07 AM, Maksim Davydov wrote: > > [snip] >>>> + >>>> + num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE; >>>> +} >>> >>> It seems that AMD implementation has one issue. >>> KVM has parameter `enable_pmu`. So vPMU can be disabled in another way, not only >>> via KVM_PMU_CAP_DISABLE. For Intel it's not a problem, because the vPMU >>> initialization uses info from KVM_GET_SUPPORTED_CPUID. The enable_pmu state is >>> reflected in KVM_GET_SUPPORTED_CPUID. Thus no PMU MSRs in kvm_put_msrs/ >>> kvm_get_msrs will be used. >>> >>> But on AMD we don't use information from KVM_GET_SUPPORTED_CPUID to set an >>> appropriate number of PMU registers. So, if vPMU is disabled by KVM parameter >>> `enable_pmu` and pmu-cap-disable=false, then has_pmu_version will be 1 after >>> kvm_init_pmu_info_amd execution. It means that in kvm_put_msrs/kvm_get_msrs 4 >>> PMU counters will be processed, but the correct behavior in that situation is to >>> skip all PMU registers. >>> I think we should get info from KVM to fix that. >>> >>> I tested this series on Zen2 and found that PMU MSRs were still processed during >>> initialization even with enable_pmu=N. But it doesn't lead to any errors in QEMU >> >> Thank you very much for the feedback and helping catch the bug! >> >> When enable_pmu=N, the QEMU (with this patchset) cannot tell if vPMU is >> supported via KVM_CAP_PMU_CAPABILITY. >> >> As it cannot disable the PMU, it falls to the legacy 4 counters. >> >> It falls to 4 counters because KVM disableds PERFCORE on enable_pmu=Y, i.e., >> >> 5220 if (enable_pmu) { >> 5221 /* >> 5222 * Enumerate support for PERFCTR_CORE if and only if KVM has >> 5223 * access to enough counters to virtualize "core" support, >> 5224 * otherwise limit vPMU support to the legacy number of >> counters. >> 5225 */ >> 5226 if (kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS_CORE) >> 5227 kvm_pmu_cap.num_counters_gp = >> min(AMD64_NUM_COUNTERS, >> 5228 >> kvm_pmu_cap.num_counters_gp); >> 5229 else >> 5230 kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE); >> 5231 >> 5232 if (kvm_pmu_cap.version != 2 || >> 5233 !kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE)) >> 5234 kvm_cpu_cap_clear(X86_FEATURE_PERFMON_V2); >> 5235 } >> >> >> During the bootup and reset, the QEMU (with this patchset) erroneously resets >> MSRs for the 4 PMCs, via line 3827. >> >> 3825 static int kvm_buf_set_msrs(X86CPU *cpu) >> 3826 { >> 3827 int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, cpu->kvm_msr_buf); >> 3828 if (ret < 0) { >> 3829 return ret; >> 3830 } >> 3831 >> 3832 if (ret < cpu->kvm_msr_buf->nmsrs) { >> 3833 struct kvm_msr_entry *e = &cpu->kvm_msr_buf->entries[ret]; >> 3834 error_report("error: failed to set MSR 0x%" PRIx32 " to 0x%" PRIx64, >> 3835 (uint32_t)e->index, (uint64_t)e->data); >> 3836 } >> 3837 >> 3838 assert(ret == cpu->kvm_msr_buf->nmsrs); >> 3839 return 0; >> 3840 } >> >> Because enable_pmu=N, the KVM doesn't support those registers. However, it >> returns 0 (not 1), because the KVM does nothing in the implicit else (i.e., line >> 4144). >> >> 3847 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) >> 3848 { >> ... ... >> 4138 case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3: >> 4139 case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1: >> 4140 case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3: >> 4141 case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1: >> 4142 if (kvm_pmu_is_valid_msr(vcpu, msr)) >> 4143 return kvm_pmu_set_msr(vcpu, msr_info); >> 4144 >> 4145 if (data) >> 4146 kvm_pr_unimpl_wrmsr(vcpu, msr, data); >> 4147 break; >> ... ... >> 4224 default: >> 4225 if (kvm_pmu_is_valid_msr(vcpu, msr)) >> 4226 return kvm_pmu_set_msr(vcpu, msr_info); >> 4227 >> 4228 /* >> 4229 * Userspace is allowed to write '0' to MSRs that KVM >> reports >> 4230 * as to-be-saved, even if an MSRs isn't fully supported. >> 4231 */ >> 4232 if (msr_info->host_initiated && !data && >> 4233 kvm_is_msr_to_save(msr)) >> 4234 break; >> 4235 >> 4236 return KVM_MSR_RET_INVALID; >> 4237 } >> 4238 return 0; >> 4239 } >> 4240 EXPORT_SYMBOL_GPL(kvm_set_msr_common); >> >> Fortunately, it returns 0 at line 4238. No error is detected by QEMU. >> >> Perhaps I may need to send message with a small patch to return 1 in the >> implicit 'else' to kvm mailing list to confirm if that is expected. >> >> However, the answer is very likely 'expected', because line 4229 to line 4230 >> already explain it. >> > > Sorry for confusing you. My fault. > I tested the previous series on Intel with the old kernel and QEMU failed with > `error: failed to set MSR 0x38d to 0x0`. So I expected the same error. > But as I can see, AMD PMU registers are processed differently than the Intel > ones. Also the default MSR behavior in KVM has been changed since 2de154f541fc I think the AMD PMU registers are treated equally with Intel ones as by the commit 2de154f541fc. Both Intel and AMD PMU registers are in msrs_to_save_pmu[]. The objective was "to avoid spurious unsupported accesses". > > I think that the current implementation with additional parameter pmu-cap- > disabled does what we expect. The guest will see disabled PMU in the same two > configurations: > * pmu-cap-disabled=true and enabled_pmu=N > * pmu-cap-disabled=true and enabled_pmu=Y > But in QEMU these two configurations will have different states (has_pmu_version > 1 and 0 respectively). I think it should be taken into account in the Although the unsupported MSR write doesn't trigger any issue (thanks to msrs_to_save_pmu[]), I agree this is the bug that I will address in v2. Thanks to the reminder, indeed I have noticed another issue to be addressed in v2: something unexpected may happen if we migrate from old KVM to new KVM (assuming same QEMU versions). Suppose one user never notice "-pmu" doesn't work on old AMD KVM, but still add "-pmu" to QEMU command line. old AMD KVM: "-pmu" doesn't take effect, due to the lack of KVM_CAP_PMU_CAPABILITY. new AMD KVM: "-pmu" takes effect. After the migration, the vPMU won't work any longer from guest's perspective. > implementation without pmu-cap-disabled (which was suggested before) to save > guest-visible state during migration. Yes, I am going to revert back to my previous solution with "-pmu". Thanks everyone's suggestion on "-pmu" vs. "pmu-cap-disabled". To finalize the decision helps move forward. Would you mind clarify "without pmu-cap-disabled (which was suggested before) to save guest-visible state during migration."? Would you mean the compatibility issue between old QEMU version (without "pmu-cap-disabled") and new QEMU version (with "pmu-cap-disabled")? Thank you very much! Dongli Zhang