After a second though and further tests with ftrace backend activated, if we update those MSRs in VMM (Qemu in my context) with a KVM_X86_SET_MSR_FILTER, each time we will read the value we will go through the following KVM exit: kvm_userspace_exit: reason KVM_EXIT_X86_RDMSR (29) Because some various MSRs (i.e MSR_{PKG,PP0,PP1}_ENERGY_STATUS) being counters (of uJoules), their values must be update regularly to make sens for the Power tools. So I'm wondering if the contexts switching (KVM->userpace->KVM) to update all MSRs will cause performance issues? > Do you have a QEMU prototype patch? My gut feeling is that these MSRs > should be handled entirely within KVM using the sched_in and sched_out > notifiers. This would also allow exposing the values to the host using > the statistics subsystem. > > Paolo > I did some Qemu hack with Qtimer to update the values regularly but I'm definitely not satisfied with the implementation. What I'm pretty sure is that updating the values should be done separately from the callback that consume the value. This would ensure the consistency of the values. In the hypothesis those MSRs are handled within KVM, we can read MSRs with rdmsrl_safe() but how can we get the percentage of CPU used by Qemu to get a proportional value of the counter? Regards, Anthony > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > Cc: Christophe Fontaine <cfontain@xxxxxxxxxx> > > Signed-off-by: Anthony Harivel <aharivel@xxxxxxxxxx> > > --- > > > > Notes: > > The main goal of this patch is to bring a first step to give energy > > awareness to VMs. > > > > As of today, KVM always report 0 in these MSRs since the entire host > > power consumption needs to be hidden from the guests. However, there is > > no fallback mechanism for VMs to measure their power usage. > > > > The idea is to let the VMMs running on top of KVM periodically update > > those MSRs with representative values of the VM's power consumption. > > > > If this solution is accepted, VMMs like QEMU will need to be patched to > > set proper values in these registers and enable power metering in > > guests. > > > > I am submitting this as an RFC to get input/feedback from a broader > > audience who may be aware of potential side effects of such a mechanism. > > > > Regards, > > Anthony > > > > "If you can’t measure it, you can’t improve it." – Lord Kelvin > > > > arch/x86/include/asm/kvm_host.h | 4 ++++ > > arch/x86/kvm/x86.c | 18 ++++++++++++++++-- > > 2 files changed, 20 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 6aaae18f1854..c6072915f229 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -1006,6 +1006,10 @@ struct kvm_vcpu_arch { > > */ > > bool pdptrs_from_userspace; > > > > + /* Powercap related MSRs */ > > + u64 msr_rapl_power_unit; > > + u64 msr_pkg_energy_status; > > + > > #if IS_ENABLED(CONFIG_HYPERV) > > hpa_t hv_root_tdp; > > #endif > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index da4bbd043a7b..adc89144f84f 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -1528,6 +1528,10 @@ static const u32 emulated_msrs_all[] = { > > > > MSR_K7_HWCR, > > MSR_KVM_POLL_CONTROL, > > + > > + /* The following MSRs can be updated by the userspace */ > > + MSR_RAPL_POWER_UNIT, > > + MSR_PKG_ENERGY_STATUS, > > }; > > > > static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)]; > > @@ -3888,6 +3892,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > * as to-be-saved, even if an MSRs isn't fully supported. > > */ > > return !msr_info->host_initiated || data; > > + case MSR_RAPL_POWER_UNIT: > > + vcpu->arch.msr_rapl_power_unit = data; > > + break; > > + case MSR_PKG_ENERGY_STATUS: > > + vcpu->arch.msr_pkg_energy_status = data; > > + break; > > default: > > if (kvm_pmu_is_valid_msr(vcpu, msr)) > > return kvm_pmu_set_msr(vcpu, msr_info); > > @@ -3973,13 +3983,17 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > * data here. Do not conditionalize this on CPUID, as KVM does not do > > * so for existing CPU-specific MSRs. > > */ > > - case MSR_RAPL_POWER_UNIT: > > case MSR_PP0_ENERGY_STATUS: /* Power plane 0 (core) */ > > case MSR_PP1_ENERGY_STATUS: /* Power plane 1 (graphics uncore) */ > > - case MSR_PKG_ENERGY_STATUS: /* Total package */ > > case MSR_DRAM_ENERGY_STATUS: /* DRAM controller */ > > msr_info->data = 0; > > break; > > + case MSR_RAPL_POWER_UNIT: > > + msr_info->data = vcpu->arch.msr_rapl_power_unit; > > + break; > > + case MSR_PKG_ENERGY_STATUS: /* Total package */ > > + msr_info->data = vcpu->arch.msr_pkg_energy_status; > > + break; > > case MSR_IA32_PEBS_ENABLE: > > case MSR_IA32_DS_AREA: > > case MSR_PEBS_DATA_CFG: > > -- > > 2.39.0 > >