2017-02-02 15:47-0200, Marcelo Tosatti: > Implement KVM hypercalls for the guest > to issue frequency changes. > > Current situation with DPDK and frequency changes is as follows: > An algorithm in the guest decides when to increase/decrease > frequency based on the queue length of the device. Does the algorithm compute with the magnitude of frequency steps? (e.g. if CPU can step with 200 MHz granularity, does the algorithm ever do 400 MHz at once, because it assumes that frequency would be enough to handle the load?) > On the host, a power manager daemon is used to listen for > frequency change requests (on another core) and issue these > requests. > > However frequency changes are performance sensitive events because: > On a change from low load condition to max load condition, > the frequency should be raised as soon as possible. > Sending a virtio-serial notification to another pCPU, > waiting for that pCPU to initiate an IPI to the requestor pCPU > to change frequency, is slower and more cache costly than > a direct hypercall to host to switch the frequency. > > If the pCPU where the power manager daemon is running > is not busy spinning on requests from the isolated DPDK vcpus, > there is also the cost of HLT wakeup for that pCPU. > > Moreover, the daemon serves multiple VMs, meaning that > the scheme is subject to additional delays from > queueing of power change requests from VMs. (Wow, this must be bringing humanity to its doom faster than the heat it helps to eliminate.) > A direct hypercall from userspace is the fastest most direct > method for the guest to change frequency and does not suffer > from the issues above. Right, userspace on bare-metal cannot change frequency directly. > The usage scenario for this hypercalls is for pinned vCPUs <-> pCPUs. And pinned tasks <-> vCPUs, because the guest kernel has no idea what frequency is being used or desired on its virtualware, so the kernel cannot even change frequency without introducing a bug ... I'm not happy about this hole through layers of isolations. The domain of valid users is very small and a problem is that any program with access to /dev/kvm gains the ability to change host CPU frequency if the host happens to use the userspace governor. We should at least enable this feature only if /dev/kvm is root-only.