On Fri, 28 Jun 2024 13:51:06 +0100, Sudeep Holla <sudeep.holla@xxxxxxx> wrote: > > On Mon, May 20, 2024 at 09:30:52PM -0700, David Dai wrote: > > Introduce a virtualized cpufreq driver for guest kernels to improve > > performance and power of workloads within VMs. > > > > This driver does two main things: > > > > 1. Sends the frequency of vCPUs as a hint to the host. The host uses the > > hint to schedule the vCPU threads and decide physical CPU frequency. > > > > 2. If a VM does not support a virtualized FIE(like AMUs), it queries the > > host CPU frequency by reading a MMIO region of a virtual cpufreq device > > to update the guest's frequency scaling factor periodically. This enables > > accurate Per-Entity Load Tracking for tasks running in the guest. > > > > + > > +/* > > + * CPU0..CPUn > > + * +-------------+-------------------------------+--------+-------+ > > + * | Register | Description | Offset | Len | > > + * +-------------+-------------------------------+--------+-------+ > > + * | cur_perf | read this register to get | 0x0 | 0x4 | > > + * | | the current perf (integer val | | | > > + * | | representing perf relative to | | | > > + * | | max performance) | | | > > + * | | that vCPU is running at | | | > > + * +-------------+-------------------------------+--------+-------+ > > + * | set_perf | write to this register to set | 0x4 | 0x4 | > > + * | | perf value of the vCPU | | | > > + * +-------------+-------------------------------+--------+-------+ > > + * | perftbl_len | number of entries in perf | 0x8 | 0x4 | > > + * | | table. A single entry in the | | | > > + * | | perf table denotes no table | | | > > + * | | and the entry contains | | | > > + * | | the maximum perf value | | | > > + * | | that this vCPU supports. | | | > > + * | | The guest can request any | | | > > + * | | value between 1 and max perf | | | > > + * | | when perftbls are not used. | | | > > + * +---------------------------------------------+--------+-------+ > > + * | perftbl_sel | write to this register to | 0xc | 0x4 | > > + * | | select perf table entry to | | | > > + * | | read from | | | > > + * +---------------------------------------------+--------+-------+ > > + * | perftbl_rd | read this register to get | 0x10 | 0x4 | > > + * | | perf value of the selected | | | > > + * | | entry based on perftbl_sel | | | > > + * +---------------------------------------------+--------+-------+ > > + * | perf_domain | performance domain number | 0x14 | 0x4 | > > + * | | that this vCPU belongs to. | | | > > + * | | vCPUs sharing the same perf | | | > > + * | | domain number are part of the | | | > > + * | | same performance domain. | | | > > + * +-------------+-------------------------------+--------+-------+ > > + */ > > I think it is good idea to version this table, so that it gives flexibility > to update the entries. It is a must if we are getting away with DT. I didn't > give complete information in my previous response where I agreed with Rafael. > > I am not sure how much feasible it is, but can it be queried via KVM IOCTLs > to VMM. Just a thought, I am exploring how to make this work even on ACPI > systems. It is simpler if we neednot rely on DT or ACPI. KVM should not have to know any of this. This is purely between a contract (and a pretty weak one) between userspace and the guest. M. -- Without deviation from the norm, progress is not possible.