(+Dapang & Zide) Hi Dongli, On Mon, Nov 04, 2024 at 01:40:17AM -0800, Dongli Zhang wrote: > Date: Mon, 4 Nov 2024 01:40:17 -0800 > From: Dongli Zhang <dongli.zhang@xxxxxxxxxx> > Subject: [PATCH 2/7] target/i386/kvm: introduce 'pmu-cap-disabled' to set > KVM_PMU_CAP_DISABLE > X-Mailer: git-send-email 2.43.5 > > The AMD PMU virtualization is not disabled when configuring > "-cpu host,-pmu" in the QEMU command line on an AMD server. Neither > "-cpu host,-pmu" nor "-cpu EPYC" effectively disables AMD PMU > virtualization in such an environment. > > As a result, VM logs typically show: > > [ 0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver. > > whereas the expected logs should be: > > [ 0.596381] Performance Events: PMU not available due to virtualization, using software events only. > [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled > > This discrepancy occurs because AMD PMU does not use CPUID to determine > whether PMU virtualization is supported. Intel platform doesn't have this issue since Linux kernel fails to check the CPU family & model when "-cpu *,-pmu" option clears PMU version. The difference between Intel and AMD platforms, however, is that it seems Intel hardly ever reaches the “...due virtualization” message, but instead reports an error because it recognizes a mismatched family/model. This may be a drawback of the PMU driver's print message, but the result is the same, it prevents the PMU driver from enabling. So, please mention that KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on Intel platform because current "pmu" property works as expected. > To address this, we introduce a new property, 'pmu-cap-disabled', for KVM > acceleration. This property sets KVM_PMU_CAP_DISABLE if > KVM_CAP_PMU_CAPABILITY is supported. Note that this feature currently > supports only x86 hosts, as KVM_CAP_PMU_CAPABILITY is used exclusively for > x86 systems. > > Signed-off-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx> > --- > Another previous solution to re-use '-cpu host,-pmu': > https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@xxxxxxxxxx/ IMO, I prefer the previous version. This VM-level KVM property is difficult to integrate with the existing CPU properties. Pls refer later comments for reasons. > accel/kvm/kvm-all.c | 1 + > include/sysemu/kvm_int.h | 1 + > qemu-options.hx | 9 ++++++- > target/i386/cpu.c | 2 +- > target/i386/kvm/kvm.c | 52 ++++++++++++++++++++++++++++++++++++++ > target/i386/kvm/kvm_i386.h | 2 ++ > 6 files changed, 65 insertions(+), 2 deletions(-) > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index 801cff16a5..8b5ba45cf7 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -3933,6 +3933,7 @@ static void kvm_accel_instance_init(Object *obj) > s->xen_evtchn_max_pirq = 256; > s->device = NULL; > s->msr_energy.enable = false; > + s->pmu_cap_disabled = false; > } The CPU property "pmu" also defaults to "false"...but: * max CPU would override this and try to enable PMU by default in max_x86_cpu_initfn(). * Other named CPU models keep the default setting to avoid affecting the migration. The pmu_cap_disabled and “pmu” property look unbound and unassociated, so this can cause the conflict when they are not synchronized. For example, -cpu host -accel kvm,pmu-cap-disabled=on The above options will fail to launch a VM (on Intel platform). Ideally, the “pmu” property and pmu-cap-disabled should be bound to each other and be consistent. But it's not easy because: - There is no proper way to have pmu_cap_disabled set different default values (e.g., "false" for max CPU and "true" for named CPU models) based on different CPU models. - And, no proper place to check the consistency of pmu_cap_disabled and enable_pmu. Therefore, I prefer your previous approach, to reuse current CPU "pmu" property. Further, considering that this is currently the only case that needs to to set the VM level's capability in the CPU context, there is no need to introduce a new kvm interface (in your previous patch), which can instead be set in kvm_cpu_realizefn(), like: diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c index 99d1941cf51c..05e9c9a1a0cf 100644 --- a/target/i386/kvm/kvm-cpu.c +++ b/target/i386/kvm/kvm-cpu.c @@ -42,6 +42,8 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp) { X86CPU *cpu = X86_CPU(cs); CPUX86State *env = &cpu->env; + KVMState *s = kvm_state; + static bool first = true; bool ret; /* @@ -63,6 +65,29 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp) * check/update ucode_rev, phys_bits, guest_phys_bits, mwait * cpu_common_realizefn() (via xcc->parent_realize) */ + + if (first) { + first = false; + + /* + * Since Linux v5.18, KVM provides a VM-level capability to easily + * disable PMUs; however, QEMU has been providing PMU property per + * CPU since v1.6. In order to accommodate both, have to configure + * the VM-level capability here. + */ + if (!cpu->enable_pmu && + kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY)) { + int r = kvm_vm_enable_cap(s, KVM_CAP_PMU_CAPABILITY, 0, + KVM_PMU_CAP_DISABLE); + + if (r < 0) { + error_setg(errp, "kvm: Failed to disable pmu cap: %s", + strerror(-r)); + return false; + } + } + } + if (cpu->max_features) { if (enable_cpu_pm) { if (kvm_has_waitpkg()) { --- In addition, if PMU is disabled, why not mask the perf related bits in 8000_0001_ECX? :) Regards, Zhao