On Mon, Oct 2, 2023 at 6:30 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > The host OS shouldn't offer facilities that severely limit its own capabilities, > when there's a better solution. We don't give the FPU to apps exclusively either, > it would be insanely stupid for a platform to do that. > If you think of the guest VM as a usermode application (which it effectively is), the analogous situation is that there is no way to tell the usermode application which portions of the FPU state might be used by the kernel without context switching. Although the kernel can and does use FPU state, it doesn't zero out a portion of that state whenever the kernel needs to use the FPU. Today there is no way for a guest to dynamically adjust which PMU state is valid or invalid. And this changes based on usage by other commands run on the host. As observed by perf subsystem running in the guest kernel, this looks like counters that simply zero out and stop counting at random. I think the request here is that there be a way for KVM to be able to tell the guest kernel (running the perf subsystem) that it has a functional HW PMU. And for that to be true. This doesn't mean taking away the use of the PMU any more than exposing the FPU to usermode applications means taking away the FPU from the kernel. But it does mean that when entering the KVM run loop, the host perf system needs to context switch away the host PMU state and allow KVM to load the guest PMU state. And much like the FPU situation, the portion of the host kernel that runs between the context switch to the KVM thread and VMENTER to the guest cannot use the PMU. This obviously should be a policy set by the host owner. They are deliberately giving up the ability to profile that small portion of the host (KVM VCPU thread cannot be profiled) in return to providing a full set of perf functionality to the guest kernel. Dave Dunn