Hi Peter, Thanks for you quick response! > On Aug 18, 2021, at 2:15 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Aug 17, 2021 at 06:29:37PM -0700, Song Liu wrote: >> The typical way to access LBR is via hardware perf_event. For CPUs with >> FREEZE_LBRS_ON_PMI support, PMI could capture reliable LBR. On the other >> hand, LBR could also be useful in non-PMI scenario. For example, in >> kretprobe or bpf fexit program, LBR could provide a lot of information >> on what happened with the function. >> >> In this RFC, we try to enable LBR for BPF program. This works like: >> 1. Create a hardware perf_event with PERF_SAMPLE_BRANCH_* on each CPU; >> 2. Call a new bpf helper (bpf_get_branch_trace) from the BPF program; >> 3. Before calling this bpf program, the kernel stops LBR on local CPU, >> make a copy of LBR, and resumes LBR; >> 4. In the bpf program, the helper access the copy from #3. >> >> Please see tools/testing/selftests/bpf/[progs|prog_tests]/get_call_trace.c >> for a detailed example. Not that, this process is far from ideal, but it >> allows quick prototype of this feature. >> >> AFAICT, the biggest challenge here is that we are now sharing LBR in PMI >> and out of PMI, which could trigger some interesting race conditions. >> However, if we allow some level of missed/corrupted samples, this should >> still be very useful. >> >> Please share your thoughts and comments on this. Thanks in advance! > >> +int bpf_branch_record_read(void) >> +{ >> + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >> + >> + intel_pmu_lbr_disable_all(); >> + intel_pmu_lbr_read(); >> + memcpy(this_cpu_ptr(&bpf_lbr_entries), cpuc->lbr_entries, >> + sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr); >> + *this_cpu_ptr(&bpf_lbr_cnt) = x86_pmu.lbr_nr; >> + intel_pmu_lbr_enable_all(false); >> + return 0; >> +} > > Urgghhh.. I so really hate BPF specials like this. I don't really like this design either. But it does show that LBR can be very useful in non-PMI scenario. > Also, the PMI race > you describe is because you're doing abysmal layer violations. If you'd > have used perf_pmu_disable() that wouldn't have been a problem. Do you mean instead of disable/enable lbr, we disable/enable the whole pmu? > > I'd much rather see a generic 'fake/inject' PMI facility, something that > works across the board and isn't tied to x86/intel. How would that work? Do we have a function to trigger PMI from software, and then gather the LBR data after the PMI? This does sound like a much cleaner solution. Where can I find code examples that fake/inject PMI? There is another limitation right now: we need to enable LBR with a hardware perf event (cycles, etc.). However, unless we use the event for something else, it wastes a hardware counter. So I was thinking to allow software event, i.e. dummy event, to enable LBR. Does this idea sound sane to you? Thanks, Song