Hi David, On 11/11/2023 01:49, David Dai wrote:
Hi, This patch series is a continuation of the talk Saravana gave at LPC 2022 titled "CPUfreq/sched and VM guest workload problems" [1][2][3]. The gist of the talk is that workloads running in a guest VM get terrible task placement and CPUfreq behavior when compared to running the same workload in the host. Effectively, no EAS(Energy Aware Scheduling) for threads inside VMs. This would make power and performance terrible just by running the workload in a VM even if we assume there is zero virtualization overhead. With this series, a workload running in a VM gets the same task placement and CPUfreq behavior as it would when running in the host. The idea is to improve VM CPUfreq/sched behavior by: - Having guest kernel do accurate load tracking by taking host CPU arch/type and frequency into account. - Sharing vCPU frequency requirements with the host so that the host can do proper frequency scaling and task placement on the host side. Based on feedback from RFC v1 proposal[4], we've revised our implementation to using MMIO reads and writes to pass information from/to host instead of using hypercalls. In our example, the VMM(Virtual Machine Manager) translates the frequency requests into Uclamp_min and applies it to the vCPU thread as a hint to the host kernel.
Sorry for not noticing this series until now. The problem you are having with uclamp is actually the same as what I'm tackling right now. Basically my conclusion so far is that uclamp max aggregation faces quite many problems, which can be easily solved by sum aggregation (summing up the clamped utilization values instead of applying the max uclamp value to the whole rq): https://lore.kernel.org/all/cover.1696345700.git.Hongyan.Xia2@xxxxxxx/ What you described as util_guest sounds to me as exactly what uclamp_min under sum aggregation does. I'm really tempted to ask you to apply my series and see if the new uclamp_min does what you want, instead of introducing a new util_guest signal. If you have no time for this I can try to replicate your setup and do the experiments myself. Also, my knowledge with KVM is limited. May I know where the vCPU fork happens? Can't you just set the p->sched_reset_on_fork flag on fork to not carry forward the uclamp values?
[...]
Hongyan