On 2/10/2022 2:55 PM, David Dunn wrote:
Kan,
On Thu, Feb 10, 2022 at 11:46 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
No, we don't, at least for Linux. Because the host own everything. It
doesn't need the MSR to tell which one is in use. We track it in an SW way.
For the new request from the guest to own a counter, I guess maybe it is
worth implementing it. But yes, the existing/legacy guest never check
the MSR.
This is the expectation of all software that uses the PMU in every
guest. It isn't just the Linux perf system.
The KVM vPMU model we have today results in the PMU utilizing software
simply not working properly in a guest. The only case that can
consistently "work" today is not giving the guest a PMU at all.
And that's why you are hearing requests to gift the entire PMU to the
guest while it is running. All existing PMU software knows about the
various constraints on exactly how each MSR must be used to get sane
data. And by gifting the entire PMU it allows that software to work
properly. But that has to be controlled by policy at host level such
that the owner of the host knows that they are not going to have PMU
visibility into guests that have control of PMU.
I think here is how a guest event works today with KVM and perf subsystem.
- Guest create an event A
- The guest kernel assigns a guest counter M to event A, and config
the related MSRs of the guest counter M.
- KVM intercepts the MSR access and create a host event B. (The
host event B is based on the settings of the guest counter M. As I said,
at least for Linux, some SW config impacts the counter assignment. KVM
never knows it. Event B can only be a similar event to A.)
- Linux perf subsystem assigns a physical counter N to a host event
B according to event B's constraint. (N may not be the same as M,
because A and B may have different event constraints)
As you can see, even the entire PMU is given to the guest, we still
cannot guarantee that the physical counter M can be assigned to the
guest event A.
How to fix it? The only thing I can imagine is "passthrough". Let KVM
directly assign the counter M to guest. So, to me, this policy sounds
like let KVM replace the perf to control the whole PMU resources, and we
will handover them to our guest then. Is it what we want?
Thanks,
Kan