Re: [PATCH v1 4/4] KVM/vmx: enable lbr for the guest

Andi Kleen <ak@xxxxxxxxxxxxxxx> · Tue, 26 Sep 2017 09:41:14 -0700

> 1) vCPU context switching and guest side task switching are not identical.
> That is, when the vCPU is scheduled out, the guest task on the vCPU may not

guest task lifetime has nothing to do with this. It's completely independent
of what you do here on the VCPU level.

> run out its time slice yet, so the task will continue to run when the vCPU
> is
> scheduled in by the host (lbr wasn't save by the guest task when the vCPU is
> scheduled out in this case).
> 
> It is possible to have the vCPU which runs the guest task (in use of lbr)
> scheduled
> out, followed by a new host task being scheduled in on the pCPU to run.
> It is not guaranteed that the new host task does not use the LBR feature on
> the
> pCPU.

Sure it may use the LBR, and the normal perf context switch
will switch it and everything works fine.

It's like any other per-task LBR user.

> 
> 2) Sometimes, people may want this usage: "perf record -b
> ./qemu-system-x86_64 ...",
> which will need lbr to be used in KVM as well.

In this obscure case you can disable LBR support for the guest.
The common case is far more important.

It sounds like you didn't do any performance measurements.
I expect the performance of your current solution to be terrible.

e.g. a normal perf PMI does at least 1 MSR reads and 4+ MSR writes
for a single counter. With multiple counters it gets worse.

For each of those you'll need to exit. Adding something
to the entry/exit list is similar to the cost of doing 
explicit RD/WRMSRs.

On Skylake we have 32*3=96 MSRs for the LBRs.

So with the 5 exits and entries, you're essentually doing
5*2*96=18432 extra MSR accesses for each PMI.

MSR access is 100+ cycles at least, for writes it is far more
expensive.

-Andi