> On Oct 7, 2021, at 8:34 PM, Like Xu <like.xu.linux@xxxxxxxxx> wrote: > > On 30/9/2021 4:05 am, Song Liu wrote: >> Hi Kan, >>> On Sep 29, 2021, at 9:35 AM, Liang, Kan <kan.liang@xxxxxxxxx> wrote: >>> >>>>>> - get confirmation that clearing GLOBAL_CTRL is suffient to supress >>>>>> PEBS, in which case we can simply remove the PEBS_ENABLE clear. >>>>> >>>>> How should we confirm this? Can we run some tests for this? Or do we >>>>> need hardware experts' input for this? >>>> >>>> I'll put it on the list to ask the hardware people when I talk to them next. But >>>> maybe Kan or Andi know without asking. >>> >>> If the GLOBAL_CTRL is explicitly disabled, the counters do not count anymore. >>> It doesn't matter if PEBS is enabled or not. >>> >>> See 6c1c07b33eb0 ("perf/x86/intel: Avoid unnecessary PEBS_ENABLE MSR >>> access in PMI "). We optimized the PMU handler base on it. >> Thanks for these information! >> IIUC, all we need is the following on top of bpf-next/master: >> diff --git i/arch/x86/events/intel/core.c w/arch/x86/events/intel/core.c >> index 1248fc1937f82..d0d357e7d6f21 100644 >> --- i/arch/x86/events/intel/core.c >> +++ w/arch/x86/events/intel/core.c >> @@ -2209,7 +2209,6 @@ intel_pmu_snapshot_branch_stack(struct perf_branch_entry *entries, unsigned int >> /* must not have branches... */ >> local_irq_save(flags); >> __intel_pmu_disable_all(false); /* we don't care about BTS */ > > If the value passed in is true, does it affect your use case? > >> - __intel_pmu_pebs_disable_all(); > > In that case, we can reuse "static __always_inline void intel_pmu_disable_all(void)" > regardless of whether PEBS is supported or enabled inside the guest and the host ? > >> __intel_pmu_lbr_disable(); > > How about using intel_pmu_lbr_disable_all() to cover Arch LBR? We are using LBR without PMI, so there isn't any hardware mechanism to stop the LBR, we have to stop it in software. There is always a delay between the event triggers and the LBR is stopped. In this window, the LBR is still running and old entries are being replaced by new entries. We actually need the old entries before the triggering event, so the key design goal here is to minimize the number of branch instructions between the event triggers and the LBR is stopped. Here, both __intel_pmu_disable_all(false) and __intel_pmu_lbr_disable() are used to optimize for this goal: the fewer branch instructions the better. After removing __intel_pmu_pebs_disable_all() from intel_pmu_snapshot_branch_stack(), we found quite a few LBR entries in extable related code. With these entries, snapshot branch stack is not really useful in the VM, because all the interesting entries are flushed by these. I am not sure how to further optimize these. Do you have some suggestions on this? Thanks, Song