Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable

"Liang, Kan" <kan.liang@xxxxxxxxxxxxxxx> · Mon, 7 Jan 2019 09:22:47 -0500

On 1/5/2019 5:09 AM, Wei Wang wrote:
On 01/04/2019 11:57 PM, Liang, Kan wrote:

On 1/4/2019 4:58 AM, Wei Wang wrote:
On 01/03/2019 12:33 AM, Liang, Kan wrote:

On 12/26/2018 4:25 AM, Wei Wang wrote:
+
+    /*
+     * It could be possible that people have vcpus of old model 
run on
+     * physcal cpus of newer model, for example a BDW guest on a SKX
+     * machine (but not possible to be the other way around).
+     * The BDW guest may not get accurate results on a SKX machine 
as it
+     * only reads 16 entries of the lbr stack while there are 32 
entries
+     * of recordings. So we currently forbid the lbr enabling when 
the
+     * vcpu and physical cpu see different lbr stack entries.

I think it's not enough to only check number of entries. The LBR 
from/to MSRs may be different even the number of entries is the 
same, e.g SLM and KNL.

Yes, we could add the comparison of the FROM msrs.

+     */
+    switch (vcpu_model) {

That's a duplicate of intel_pmu_init(). I think it's better to 
factor out the common part if you want to check LBR MSRs and 
entries. Then we don't need to add the same codes in two different 
places when enabling new platforms.

Yes, I thought about this, but intel_pmu_init() does a lot more 
things in each "Case xx". Any thought about how to factor them out?

I think we may only move the "switch (boot_cpu_data.x86_model) { ... 
}" to a new function, e.g. __intel_pmu_init(int model, struct x86_pmu 
*x86_pmu)

In __intel_pmu_init, if the model != boot_cpu_data.x86_model, you only 
need to update x86_pmu.*. Just ignore global settings, e.g 
hw_cache_event_ids, mem_attr, extra_attr etc.

Thanks for sharing. I understand the point of maintaining those models 
at one place,
but this factor-out doesn't seem very elegant to me, like below

__intel_pmu_init (int model, struct x86_pmu *x86_pmu)
{
...
switch (model)
case INTEL_FAM6_NEHALEM:
case INTEL_FAM6_NEHALEM_EP:
case INTEL_FAM6_NEHALEM_EX:
     intel_pmu_lbr_init(x86_pmu);
     if (model != boot_cpu_data.x86_model)
         return;

     /* Other a lot of things init like below..*/
     memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
                    sizeof(hw_cache_event_ids));
     memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
                    sizeof(hw_cache_extra_regs));
     x86_pmu.event_constraints = intel_nehalem_event_constraints;
                 x86_pmu.pebs_constraints = 
intel_nehalem_pebs_event_constraints;
                 x86_pmu.enable_all = intel_pmu_nhm_enable_all;
                 x86_pmu.extra_regs = intel_nehalem_extra_regs;
  ...

Case...
}
We need insert "if (model != boot_cpu_data.x86_model)" in every "Case xx".

What would be the rationale that we only do lbr_init for "x86_pmu"
when model != boot_cpu_data.x86_model?
(It looks more like a workaround to factor-out the function and get what 
we want)

I thought the new function may be extended to support fake pmu as below.
It's not only for lbr. PMU has many CPU specific features. It can be 
used for other features, if you want to check the compatibility in 
future. But I don't have an example now.

__intel_pmu_init (int model, struct x86_pmu *x86_pmu)
{
bool fake_pmu = (model != boot_cpu_data.x86_model) ? true : false;
...
switch (model)
case INTEL_FAM6_NEHALEM:
case INTEL_FAM6_NEHALEM_EP:
case INTEL_FAM6_NEHALEM_EX:
     intel_pmu_lbr_init(x86_pmu);
     x86_pmu->event_constraints = intel_nehalem_event_constraints;
     x86_pmu->pebs_constraints = intel_nehalem_pebs_event_constraints;
     x86_pmu->enable_all = intel_pmu_nhm_enable_all;
     x86_pmu->extra_regs = intel_nehalem_extra_regs;

     if (fake_pmu)
         return;

     /* Global variables should not be updated for fake PMU */
     memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
                    sizeof(hw_cache_event_ids));
     memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
                    sizeof(hw_cache_extra_regs));

I would prefer having them separated as this patch for now - it is 
logically more clear to me.

But it will be a problem for maintenance. Perf developer probably forget 
to update the list in KVM. I think you have to regularly check the perf 
code.

Thanks,
Kan