On Thu, Dec 5, 2024 at 1:00 AM Nikunj A Dadhania <nikunj@xxxxxxx> wrote: > > On 11/22/2024 12:22 AM, Mingwei Zhang wrote: > > Linux guests read IA32_APERF and IA32_MPERF on every scheduler tick > > (250 Hz by default) to measure their effective CPU frequency. To avoid > > the overhead of intercepting these frequent MSR reads, allow the guest > > to read them directly by loading guest values into the hardware MSRs. > > > > These MSRs are continuously running counters whose values must be > > carefully tracked during all vCPU state transitions: > > - Guest IA32_APERF advances only during guest execution > > - Guest IA32_MPERF advances at the TSC frequency whenever the vCPU is > > in C0 state, even when not actively running > > Any particular reason to treat APERF and MPERF differently? Core cycles accumulated by the logical processor that do not contribute to the execution of the virtual processor should not be counted. For example, consider Google Cloud's e2-small VM type, which is capped at a 25% duty cycle. Even if the logical processor is humming along at an effective frequency of 3.6 GHz, an e2-small vCPU task is only resident 25% of the time, so its effective frequency is more like 0.9 GHz (over a sufficiently large period of time). Similarly, if a logical processor running at 3.6 GHz is shared 50/50 by two vCPUs, the effective frequency of each is about 1.8 GHz (again, over a sufficiently large period of time). Over smaller time periods, the effective frequencies in these examples would look like square waves, alternating between 3.6 GHz and 0, much like thermal throttling. And, much like thermal throttling, MPERF reference cycles continue to tick on at the fixed reference frequency, even when APERF cycles drop to 0. > AFAIU, APERF and MPERF architecturally will count when the CPU is in C0 state. > MPERF counting at constant frequency and the APERF counting at a variable > frequency. Shouldn't we treat APERF and MPERF equal and keep on counting in C0 > state and even when "not actively running" ? > > Can you clarify what do you mean by "not actively running"? The current implementation considers the vCPU to be actively running if the task is in the KVM_RUN ioctl, between vcpu_load() and vcpu_put(). This also implies that the task itself is currently running on a logical processor, since there is a vcpu_put() on sched_out and a vcpu_load() on sched_in. As Sean points out, this is only an approximation, since (a) such things as I/O completion in userspace are not counted, and (b) such things as uncompressing a zswapped page that happen in the vCPU task are counted. > Regards > Nikunj >