Re: [PATCH 5/7] target/i386/kvm: reset AMD PMU registers during VM reset

dongli.zhang@xxxxxxxxxx · Fri, 8 Nov 2024 10:04:41 -0800

Hi Maksim,

On 11/8/24 6:07 AM, Maksim Davydov wrote:
> 
> 

[snip]

>>>> +
>>>> +    num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
>>>> +}
>>>
>>> It seems that AMD implementation has one issue.
>>> KVM has parameter `enable_pmu`. So vPMU can be disabled in another way, not only
>>> via KVM_PMU_CAP_DISABLE. For Intel it's not a problem, because the vPMU
>>> initialization uses info from KVM_GET_SUPPORTED_CPUID. The enable_pmu state is
>>> reflected in KVM_GET_SUPPORTED_CPUID.  Thus no PMU MSRs in kvm_put_msrs/
>>> kvm_get_msrs will be used.
>>>
>>> But on AMD we don't use information from KVM_GET_SUPPORTED_CPUID to set an
>>> appropriate number of PMU registers. So, if vPMU is disabled by KVM parameter
>>> `enable_pmu` and pmu-cap-disable=false, then has_pmu_version will be 1 after
>>> kvm_init_pmu_info_amd execution. It means that in kvm_put_msrs/kvm_get_msrs 4
>>> PMU counters will be processed, but the correct behavior in that situation is to
>>> skip all PMU registers.
>>> I think we should get info from KVM to fix that.
>>>
>>> I tested this series on Zen2 and found that PMU MSRs were still processed during
>>> initialization even with enable_pmu=N. But it doesn't lead to any errors in QEMU
>>
>> Thank you very much for the feedback and helping catch the bug!
>>
>> When enable_pmu=N, the QEMU (with this patchset) cannot tell if vPMU is
>> supported via KVM_CAP_PMU_CAPABILITY.
>>
>> As it cannot disable the PMU, it falls to the legacy 4 counters.
>>
>> It falls to 4 counters because KVM disableds PERFCORE on enable_pmu=Y, i.e.,
>>
>> 5220         if (enable_pmu) {
>> 5221                 /*
>> 5222                  * Enumerate support for PERFCTR_CORE if and only if KVM has
>> 5223                  * access to enough counters to virtualize "core" support,
>> 5224                  * otherwise limit vPMU support to the legacy number of
>> counters.
>> 5225                  */
>> 5226                 if (kvm_pmu_cap.num_counters_gp < AMD64_NUM_COUNTERS_CORE)
>> 5227                         kvm_pmu_cap.num_counters_gp =
>> min(AMD64_NUM_COUNTERS,
>> 5228
>> kvm_pmu_cap.num_counters_gp);
>> 5229                 else
>> 5230                         kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE);
>> 5231
>> 5232                 if (kvm_pmu_cap.version != 2 ||
>> 5233                     !kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE))
>> 5234                         kvm_cpu_cap_clear(X86_FEATURE_PERFMON_V2);
>> 5235         }
>>
>>
>> During the bootup and reset, the QEMU (with this patchset) erroneously resets
>> MSRs for the 4 PMCs, via line 3827.
>>
>> 3825 static int kvm_buf_set_msrs(X86CPU *cpu)
>> 3826 {
>> 3827     int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, cpu->kvm_msr_buf);
>> 3828     if (ret < 0) {
>> 3829         return ret;
>> 3830     }
>> 3831
>> 3832     if (ret < cpu->kvm_msr_buf->nmsrs) {
>> 3833         struct kvm_msr_entry *e = &cpu->kvm_msr_buf->entries[ret];
>> 3834         error_report("error: failed to set MSR 0x%" PRIx32 " to 0x%" PRIx64,
>> 3835                      (uint32_t)e->index, (uint64_t)e->data);
>> 3836     }
>> 3837
>> 3838     assert(ret == cpu->kvm_msr_buf->nmsrs);
>> 3839     return 0;
>> 3840 }
>>
>> Because enable_pmu=N, the KVM doesn't support those registers. However, it
>> returns 0 (not 1), because the KVM does nothing in the implicit else (i.e., line
>> 4144).
>>
>> 3847 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> 3848 {
>> ... ...
>> 4138         case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
>> 4139         case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
>> 4140         case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
>> 4141         case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
>> 4142                 if (kvm_pmu_is_valid_msr(vcpu, msr))
>> 4143                         return kvm_pmu_set_msr(vcpu, msr_info);
>> 4144
>> 4145                 if (data)
>> 4146                         kvm_pr_unimpl_wrmsr(vcpu, msr, data);
>> 4147                 break;
>> ... ...
>> 4224         default:
>> 4225                 if (kvm_pmu_is_valid_msr(vcpu, msr))
>> 4226                         return kvm_pmu_set_msr(vcpu, msr_info);
>> 4227
>> 4228                 /*
>> 4229                  * Userspace is allowed to write '0' to MSRs that KVM
>> reports
>> 4230                  * as to-be-saved, even if an MSRs isn't fully supported.
>> 4231                  */
>> 4232                 if (msr_info->host_initiated && !data &&
>> 4233                     kvm_is_msr_to_save(msr))
>> 4234                         break;
>> 4235
>> 4236                 return KVM_MSR_RET_INVALID;
>> 4237         }
>> 4238         return 0;
>> 4239 }
>> 4240 EXPORT_SYMBOL_GPL(kvm_set_msr_common);
>>
>> Fortunately, it returns 0 at line 4238. No error is detected by QEMU.
>>
>> Perhaps I may need to send message with a small patch to return 1 in the
>> implicit 'else' to kvm mailing list to confirm if that is expected.
>>
>> However, the answer is very likely 'expected', because line 4229 to line 4230
>> already explain it.
>>
> 
> Sorry for confusing you. My fault.
> I tested the previous series on Intel with the old kernel and QEMU failed with
> `error: failed to set MSR 0x38d to 0x0`. So I expected the same error.
> But as I can see, AMD PMU registers are processed differently than the Intel
> ones. Also the default MSR behavior in KVM has been changed since 2de154f541fc

I think the AMD PMU registers are treated equally with Intel ones as by the
commit 2de154f541fc. Both Intel and AMD PMU registers are in msrs_to_save_pmu[].

The objective was "to avoid spurious unsupported accesses".

> 
> I think that the current implementation with additional parameter pmu-cap-
> disabled does what we expect. The guest will see disabled PMU in the same two
> configurations:
> * pmu-cap-disabled=true and enabled_pmu=N
> * pmu-cap-disabled=true and enabled_pmu=Y
> But in QEMU these two configurations will have different states (has_pmu_version
> 1 and 0 respectively). I think it should be taken into account in the

Although the unsupported MSR write doesn't trigger any issue (thanks to
msrs_to_save_pmu[]), I agree this is the bug that I will address in v2.

Thanks to the reminder, indeed I have noticed another issue to be addressed in
v2: something unexpected may happen if we migrate from old KVM to new KVM
(assuming same QEMU versions).

Suppose one user never notice "-pmu" doesn't work on old AMD KVM, but still add
"-pmu" to QEMU command line.

old AMD KVM: "-pmu" doesn't take effect, due to the lack of KVM_CAP_PMU_CAPABILITY.

new AMD KVM: "-pmu" takes effect.

After the migration, the vPMU won't work any longer from guest's perspective.

> implementation without pmu-cap-disabled (which was suggested before) to save
> guest-visible state during migration.

Yes, I am going to revert back to my previous solution with "-pmu".

Thanks everyone's suggestion on "-pmu" vs. "pmu-cap-disabled". To finalize the
decision helps move forward.

Would you mind clarify "without pmu-cap-disabled (which was suggested before) to
save guest-visible state during migration."?

Would you mean the compatibility issue between old QEMU version (without
"pmu-cap-disabled") and new QEMU version (with "pmu-cap-disabled")?

Thank you very much!

Dongli Zhang