On Wed, Jan 18, 2023 at 3:21 PM Anthony Harivel <aharivel@xxxxxxxxxx> wrote: > > Allow userspace to update the MSR_RAPL_POWER_UNIT and > MSR_PKG_POWER_STATUS powercap registers. By default, these MSRs still > return 0. > > This enables VMMs running on top of KVM with access to energy metrics > like /sys/devices/virtual/powercap/*/*/energy_uj to compute VMs power > values in proportion with other metrics (e.g. CPU %guest, steal time, > etc.) and periodically update the MSRs with ioctl KVM_SET_MSRS so that > the guest OS can consume them using power metering tools. Do you have a QEMU prototype patch? My gut feeling is that these MSRs should be handled entirely within KVM using the sched_in and sched_out notifiers. This would also allow exposing the values to the host using the statistics subsystem. Paolo > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Christophe Fontaine <cfontain@xxxxxxxxxx> > Signed-off-by: Anthony Harivel <aharivel@xxxxxxxxxx> > --- > > Notes: > The main goal of this patch is to bring a first step to give energy > awareness to VMs. > > As of today, KVM always report 0 in these MSRs since the entire host > power consumption needs to be hidden from the guests. However, there is > no fallback mechanism for VMs to measure their power usage. > > The idea is to let the VMMs running on top of KVM periodically update > those MSRs with representative values of the VM's power consumption. > > If this solution is accepted, VMMs like QEMU will need to be patched to > set proper values in these registers and enable power metering in > guests. > > I am submitting this as an RFC to get input/feedback from a broader > audience who may be aware of potential side effects of such a mechanism. > > Regards, > Anthony > > "If you can’t measure it, you can’t improve it." – Lord Kelvin > > arch/x86/include/asm/kvm_host.h | 4 ++++ > arch/x86/kvm/x86.c | 18 ++++++++++++++++-- > 2 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 6aaae18f1854..c6072915f229 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1006,6 +1006,10 @@ struct kvm_vcpu_arch { > */ > bool pdptrs_from_userspace; > > + /* Powercap related MSRs */ > + u64 msr_rapl_power_unit; > + u64 msr_pkg_energy_status; > + > #if IS_ENABLED(CONFIG_HYPERV) > hpa_t hv_root_tdp; > #endif > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index da4bbd043a7b..adc89144f84f 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1528,6 +1528,10 @@ static const u32 emulated_msrs_all[] = { > > MSR_K7_HWCR, > MSR_KVM_POLL_CONTROL, > + > + /* The following MSRs can be updated by the userspace */ > + MSR_RAPL_POWER_UNIT, > + MSR_PKG_ENERGY_STATUS, > }; > > static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)]; > @@ -3888,6 +3892,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > * as to-be-saved, even if an MSRs isn't fully supported. > */ > return !msr_info->host_initiated || data; > + case MSR_RAPL_POWER_UNIT: > + vcpu->arch.msr_rapl_power_unit = data; > + break; > + case MSR_PKG_ENERGY_STATUS: > + vcpu->arch.msr_pkg_energy_status = data; > + break; > default: > if (kvm_pmu_is_valid_msr(vcpu, msr)) > return kvm_pmu_set_msr(vcpu, msr_info); > @@ -3973,13 +3983,17 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > * data here. Do not conditionalize this on CPUID, as KVM does not do > * so for existing CPU-specific MSRs. > */ > - case MSR_RAPL_POWER_UNIT: > case MSR_PP0_ENERGY_STATUS: /* Power plane 0 (core) */ > case MSR_PP1_ENERGY_STATUS: /* Power plane 1 (graphics uncore) */ > - case MSR_PKG_ENERGY_STATUS: /* Total package */ > case MSR_DRAM_ENERGY_STATUS: /* DRAM controller */ > msr_info->data = 0; > break; > + case MSR_RAPL_POWER_UNIT: > + msr_info->data = vcpu->arch.msr_rapl_power_unit; > + break; > + case MSR_PKG_ENERGY_STATUS: /* Total package */ > + msr_info->data = vcpu->arch.msr_pkg_energy_status; > + break; > case MSR_IA32_PEBS_ENABLE: > case MSR_IA32_DS_AREA: > case MSR_PEBS_DATA_CFG: > -- > 2.39.0 >