On Thu, Sep 15, 2022, Dapeng Mi wrote: > Halt polling is enabled by default even through the CPU frequency > governor is configured to powersave. Generally halt polling would > consume extra power and this's not identical with the intent of > powersave governor. > > disabling halt polling in powersave governor can save the precious > power in power critical case. > > FIO random read test on Alder Lake platform shows halt polling > occupies ~17% CPU utilization and consume 7% extra CPU power. > After disabling halt polling, CPU has more chance to enter deeper > C-states (C1E%: 25.3% -> 33.4%, C10%: 4.4% -> 17.4%). > > On Alder Lake platform, we don't find there are obvious performance > downgrade after disabling halt polling on FIO and Netperf cases. > Netperf UDP_RR case runs from two VMs locate on two different physical > machines. > > FIO(MB/s) Base Disable-halt-polling Delta% > Rand-read 432.6 436.3 0.8% > > Netperf Base Disable-halt-polling Delta% > UDP_RR 509.8 508.5 -0.3% > > Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxx> > --- > arch/x86/kvm/x86.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d7374d768296..c0eb6574cbbb 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -13015,7 +13015,22 @@ bool kvm_vector_hashing_enabled(void) > > bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) > { > - return (vcpu->arch.msr_kvm_poll_control & 1) == 0; > + struct cpufreq_policy *policy = cpufreq_cpu_get(vcpu->cpu); Preemption is not disabled at this point, which means that using vcpu->cpu is potentially unsafe. Given that cpufreq is refcounting the returned object, I gotta imaging get migrated to a different pCPU would be problematic. > + bool powersave = false; I don't see anything in here that's x86 specific. Unless I'm missing something, this belongs in common KVM. > + > + /* > + * Halt polling could consume much CPU power, if CPU frequency > + * governor is set to "powersave", disable halt polling. > + */ > + if (policy) { > + if ((policy->policy == CPUFREQ_POLICY_POWERSAVE) || > + (policy->governor && Indentation is messed up. > + !strncmp(policy->governor->name, "powersave", KVM should not be comparing magic strings. If the cpufreq subsystem can't get policy->policy right, then that needs to be fixed. > + CPUFREQ_NAME_LEN))) > + powersave = true; > + cpufreq_cpu_put(policy); > + } > + return ((vcpu->arch.msr_kvm_poll_control & 1) == 0) || powersave; Doing all of the above work if polling is disabled is silly. > } > EXPORT_SYMBOL_GPL(kvm_arch_no_poll); All in all, _if_ we want to do this automatically and not let userspace decide how to manage powersave vs. halt-poll, I think this should be more like: --- virt/kvm/kvm_main.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e30f1b4ecfa5..01116859cb31 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -29,6 +29,7 @@ #include <linux/file.h> #include <linux/syscore_ops.h> #include <linux/cpu.h> +#include <linux/cpufreq.h> #include <linux/sched/signal.h> #include <linux/sched/mm.h> #include <linux/sched/stat.h> @@ -3483,6 +3484,23 @@ static inline void update_halt_poll_stats(struct kvm_vcpu *vcpu, ktime_t start, } } +static bool kvm_cpufreq_no_halt_poll(struct kvm_vcpu *vcpu) +{ + struct cpufreq_policy *policy; + bool powersave = false; + + preempt_disable(); + + policy = cpufreq_cpu_get(vcpu->cpu); + if (policy) { + powersave = (policy->policy == CPUFREQ_POLICY_POWERSAVE); + cpufreq_cpu_put(policy); + } + + preempt_enable(); + return powersave; +} + /* * Emulate a vCPU halt condition, e.g. HLT on x86, WFI on arm, etc... If halt * polling is enabled, busy wait for a short time before blocking to avoid the @@ -3491,7 +3509,8 @@ static inline void update_halt_poll_stats(struct kvm_vcpu *vcpu, ktime_t start, */ void kvm_vcpu_halt(struct kvm_vcpu *vcpu) { - bool halt_poll_allowed = !kvm_arch_no_poll(vcpu); + const bool halt_poll_allowed = !kvm_arch_no_poll(vcpu) && + !kvm_cpufreq_no_halt_poll(vcpu); bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns; ktime_t start, cur, poll_end; bool waited = false; base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354 --