On Tue, Aug 17, 2021 at 06:29:37PM -0700, Song Liu wrote: > The typical way to access LBR is via hardware perf_event. For CPUs with > FREEZE_LBRS_ON_PMI support, PMI could capture reliable LBR. On the other > hand, LBR could also be useful in non-PMI scenario. For example, in > kretprobe or bpf fexit program, LBR could provide a lot of information > on what happened with the function. > > In this RFC, we try to enable LBR for BPF program. This works like: > 1. Create a hardware perf_event with PERF_SAMPLE_BRANCH_* on each CPU; > 2. Call a new bpf helper (bpf_get_branch_trace) from the BPF program; > 3. Before calling this bpf program, the kernel stops LBR on local CPU, > make a copy of LBR, and resumes LBR; > 4. In the bpf program, the helper access the copy from #3. > > Please see tools/testing/selftests/bpf/[progs|prog_tests]/get_call_trace.c > for a detailed example. Not that, this process is far from ideal, but it > allows quick prototype of this feature. > > AFAICT, the biggest challenge here is that we are now sharing LBR in PMI > and out of PMI, which could trigger some interesting race conditions. > However, if we allow some level of missed/corrupted samples, this should > still be very useful. > > Please share your thoughts and comments on this. Thanks in advance! > +int bpf_branch_record_read(void) > +{ > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > + > + intel_pmu_lbr_disable_all(); > + intel_pmu_lbr_read(); > + memcpy(this_cpu_ptr(&bpf_lbr_entries), cpuc->lbr_entries, > + sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr); > + *this_cpu_ptr(&bpf_lbr_cnt) = x86_pmu.lbr_nr; > + intel_pmu_lbr_enable_all(false); > + return 0; > +} Urgghhh.. I so really hate BPF specials like this. Also, the PMI race you describe is because you're doing abysmal layer violations. If you'd have used perf_pmu_disable() that wouldn't have been a problem. I'd much rather see a generic 'fake/inject' PMI facility, something that works across the board and isn't tied to x86/intel.