Re: [RFC] KVM: x86: Give host userspace control for MSR_RAPL_POWER_UNIT and MSR_PKG_POWER_STATUS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 18, 2023 at 3:21 PM Anthony Harivel <aharivel@xxxxxxxxxx> wrote:
>
> Allow userspace to update the MSR_RAPL_POWER_UNIT and
> MSR_PKG_POWER_STATUS powercap registers. By default, these MSRs still
> return 0.
>
> This enables VMMs running on top of KVM with access to energy metrics
> like /sys/devices/virtual/powercap/*/*/energy_uj to compute VMs power
> values in proportion with other metrics (e.g. CPU %guest, steal time,
> etc.) and periodically update the MSRs with ioctl KVM_SET_MSRS so that
> the guest OS can consume them using power metering tools.

Do you have a QEMU prototype patch? My gut feeling is that these MSRs
should be handled entirely within KVM using the sched_in and sched_out
notifiers. This would also allow exposing the values to the host using
the statistics subsystem.

Paolo

> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Christophe Fontaine <cfontain@xxxxxxxxxx>
> Signed-off-by: Anthony Harivel <aharivel@xxxxxxxxxx>
> ---
>
> Notes:
>     The main goal of this patch is to bring a first step to give energy
>     awareness to VMs.
>
>     As of today, KVM always report 0 in these MSRs since the entire host
>     power consumption needs to be hidden from the guests. However, there is
>     no fallback mechanism for VMs to measure their power usage.
>
>     The idea is to let the VMMs running on top of KVM periodically update
>     those MSRs with representative values of the VM's power consumption.
>
>     If this solution is accepted, VMMs like QEMU will need to be patched to
>     set proper values in these registers and enable power metering in
>     guests.
>
>     I am submitting this as an RFC to get input/feedback from a broader
>     audience who may be aware of potential side effects of such a mechanism.
>
>     Regards,
>     Anthony
>
>     "If you can’t measure it, you can’t improve it." – Lord Kelvin
>
>  arch/x86/include/asm/kvm_host.h |  4 ++++
>  arch/x86/kvm/x86.c              | 18 ++++++++++++++++--
>  2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6aaae18f1854..c6072915f229 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1006,6 +1006,10 @@ struct kvm_vcpu_arch {
>          */
>         bool pdptrs_from_userspace;
>
> +       /* Powercap related MSRs */
> +       u64 msr_rapl_power_unit;
> +       u64 msr_pkg_energy_status;
> +
>  #if IS_ENABLED(CONFIG_HYPERV)
>         hpa_t hv_root_tdp;
>  #endif
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index da4bbd043a7b..adc89144f84f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1528,6 +1528,10 @@ static const u32 emulated_msrs_all[] = {
>
>         MSR_K7_HWCR,
>         MSR_KVM_POLL_CONTROL,
> +
> +       /* The following MSRs can be updated by the userspace */
> +       MSR_RAPL_POWER_UNIT,
> +       MSR_PKG_ENERGY_STATUS,
>  };
>
>  static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
> @@ -3888,6 +3892,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                  * as to-be-saved, even if an MSRs isn't fully supported.
>                  */
>                 return !msr_info->host_initiated || data;
> +       case MSR_RAPL_POWER_UNIT:
> +               vcpu->arch.msr_rapl_power_unit = data;
> +               break;
> +       case MSR_PKG_ENERGY_STATUS:
> +               vcpu->arch.msr_pkg_energy_status = data;
> +               break;
>         default:
>                 if (kvm_pmu_is_valid_msr(vcpu, msr))
>                         return kvm_pmu_set_msr(vcpu, msr_info);
> @@ -3973,13 +3983,17 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>          * data here. Do not conditionalize this on CPUID, as KVM does not do
>          * so for existing CPU-specific MSRs.
>          */
> -       case MSR_RAPL_POWER_UNIT:
>         case MSR_PP0_ENERGY_STATUS:     /* Power plane 0 (core) */
>         case MSR_PP1_ENERGY_STATUS:     /* Power plane 1 (graphics uncore) */
> -       case MSR_PKG_ENERGY_STATUS:     /* Total package */
>         case MSR_DRAM_ENERGY_STATUS:    /* DRAM controller */
>                 msr_info->data = 0;
>                 break;
> +       case MSR_RAPL_POWER_UNIT:
> +               msr_info->data = vcpu->arch.msr_rapl_power_unit;
> +               break;
> +       case MSR_PKG_ENERGY_STATUS:     /* Total package */
> +               msr_info->data = vcpu->arch.msr_pkg_energy_status;
> +               break;
>         case MSR_IA32_PEBS_ENABLE:
>         case MSR_IA32_DS_AREA:
>         case MSR_PEBS_DATA_CFG:
> --
> 2.39.0
>





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux