Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU

"Mi, Dapeng" <dapeng1.mi@xxxxxxxxxxxxxxx> · Tue, 23 Apr 2024 11:59:33 +0800





On 4/23/2024 11:26 AM, maobibo wrote:


On 2024/4/23 上午11:13, Mi, Dapeng wrote:

On 4/23/2024 10:53 AM, maobibo wrote:


On 2024/4/23 上午10:44, Mi, Dapeng wrote:

On 4/23/2024 9:01 AM, maobibo wrote:


On 2024/4/23 上午1:01, Sean Christopherson wrote:
On Mon, Apr 22, 2024, maobibo wrote:
On 2024/4/16 上午6:45, Sean Christopherson wrote:
On Mon, Apr 15, 2024, Mingwei Zhang wrote:
On Mon, Apr 15, 2024 at 10:38 AM Sean Christopherson 
<seanjc@xxxxxxxxxx> wrote:
One my biggest complaints with the current vPMU code is that 
the roles and
responsibilities between KVM and perf are poorly defined, 
which leads to suboptimal
and hard to maintain code.

Case in point, I'm pretty sure leaving guest values in PMCs 
_would_ leak guest
state to userspace processes that have RDPMC permissions, as 
the PMCs might not
be dirty from perf's perspective (see 
perf_clear_dirty_counters()).

Blindly clearing PMCs in KVM "solves" that problem, but in 
doing so makes the
overall code brittle because it's not clear whether KVM 
_needs_ to clear PMCs,
or if KVM is just being paranoid.

So once this rolls out, perf and vPMU are clients directly to 
PMU HW.

I don't think this is a statement we want to make, as it opens 
a discussion
that we won't win.  Nor do I think it's one we *need* to make. 
KVM doesn't need
to be on equal footing with perf in terms of owning/managing 
PMU hardware, KVM
just needs a few APIs to allow faithfully and accurately 
virtualizing a guest PMU.

Faithful cleaning (blind cleaning) has to be the baseline
implementation, until both clients agree to a "deal" between 
them.
Currently, there is no such deal, but I believe we could have 
one via
future discussion.

What I am saying is that there needs to be a "deal" in place 
before this code
is merged.  It doesn't need to be anything fancy, e.g. perf can 
still pave over
PMCs it doesn't immediately load, as opposed to using 
cpu_hw_events.dirty to lazily
do the clearing.  But perf and KVM need to work together from 
the get go, ie. I
don't want KVM doing something without regard to what perf 
does, and vice versa.

There is similar issue on LoongArch vPMU where vm can directly 
pmu hardware
and pmu hw is shard with guest and host. Besides context switch 
there are
other places where perf core will access pmu hw, such as tick
timer/hrtimer/ipi function call, and KVM can only intercept 
context switch.

Two questions:

  1) Can KVM prevent the guest from accessing the PMU?

  2) If so, KVM can grant partial access to the PMU, or is it all 
or nothing?

If the answer to both questions is "yes", then it sounds like 
LoongArch *requires*
mediated/passthrough support in order to virtualize its PMU.

Hi Sean,

Thank for your quick response.

yes, kvm can prevent guest from accessing the PMU and grant 
partial or all to access to the PMU. Only that if one pmu event is 
granted to VM, host can not access this pmu event again. There 
must be pmu event switch if host want to.

PMU event is a software entity which won't be shared. did you mean 
if a PMU HW counter is granted to VM, then Host can't access the 
PMU HW counter, right?
yes, if PMU HW counter/control is granted to VM. The value comes 
from guest, and is not meaningful for host.  Host pmu core does not 
know that it is granted to VM, host still think that it owns pmu.

That's one issue this patchset tries to solve. Current new mediated 
x86 vPMU framework doesn't allow Host or Guest own the PMU HW 
resource simultaneously. Only when there is no !exclude_guest event 
on host, guest is allowed to exclusively own the PMU HW resource.



Just like FPU register, it is shared by VM and host during different 
time and it is lately switched. But if IPI or timer interrupt uses 
FPU register on host, there will be the same issue.

I didn't fully get your point. When IPI or timer interrupt reach, a 
VM-exit is triggered to make CPU traps into host first and then the 
host interrupt handler is called. Or are you complaining the 
executing sequence of switching guest PMU MSRs and these interrupt 
handler?
It is not necessary to save/restore PMU HW at every vm exit, it had 
better be lately saved/restored, such as only when vcpu thread is 
sched-out/sched-in, else the cost will be a little expensive.

I suspect this optimization deferring guest PMU state save/restore to 
vCPU task switching boundary would be really landed into KVM since it 
would make host lose the capability to profile KVM and It seems Sean 
object this.



I know little about perf core. However there is PMU HW access in 
interrupt mode. That means PMU HW access should be irq disabled in 
general mode, else there may be nested PMU HW access. Is that true?

I had no idea that timer irq handler would access PMU MSRs before. Could 
you please show me the code and I would look at it first. Thanks.






Regards
Bibo Mao




Can we add callback handler in structure kvm_guest_cbs?  just 
like this:
@@ -6403,6 +6403,7 @@ static struct perf_guest_info_callbacks 
kvm_guest_cbs
= {
         .state                  = kvm_guest_state,
         .get_ip                 = kvm_guest_get_ip,
         .handle_intel_pt_intr   = NULL,
+       .lose_pmu               = kvm_guest_lose_pmu,
  };

By the way, I do not know should the callback handler be 
triggered in perf
core or detailed pmu hw driver. From ARM pmu hw driver, it is 
triggered in
pmu hw driver such as function kvm_vcpu_pmu_resync_el0,
but I think it will be better if it is done in perf core.

I don't think we want to take the approach of perf and KVM guests 
"fighting" over
the PMU.  That's effectively what we have today, and it's a mess 
for KVM because
it's impossible to provide consistent, deterministic behavior for 
the guest.  And
it's just as messy for perf, which ends up having wierd, 
cumbersome flows that
exists purely to try to play nice with KVM.
With existing pmu core code, in tick timer interrupt or IPI 
function call interrupt pmu hw may be accessed by host when VM is 
running and pmu is already granted to guest. KVM can not intercept 
host IPI/timer interrupt, there is no pmu context switch, there 
will be problem.

Regards
Bibo Mao