> On Aug 31, 2021, at 9:41 AM, Song Liu <songliubraving@xxxxxx> wrote: > > > >> On Aug 31, 2021, at 8:32 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: >> >> On Mon, Aug 30, 2021 at 02:41:05PM -0700, Song Liu wrote: >> >>> @@ -564,6 +565,18 @@ static void notrace inc_misses_counter(struct bpf_prog *prog) >>> u64 notrace __bpf_prog_enter(struct bpf_prog *prog) >>> __acquires(RCU) >>> { >> preempt_disable_notrace(); >> >>> +#ifdef CONFIG_PERF_EVENTS >>> + /* Calling migrate_disable costs two entries in the LBR. To save >>> + * some entries, we call perf_snapshot_branch_stack before >>> + * migrate_disable to save some entries. This is OK because we >>> + * care about the branch trace before entering the BPF program. >>> + * If migrate happens exactly here, there isn't much we can do to >>> + * preserve the data. >>> + */ >>> + if (prog->call_get_branch) >>> + static_call(perf_snapshot_branch_stack)( >>> + this_cpu_ptr(&bpf_perf_branch_snapshot)); >> >> Here the comment is accurate, but if you recall the calling context >> requirements of perf_snapshot_branch_stack from the last patch, you'll >> see it requires you have at the very least preemption disabled, which >> you just violated. > >> >> I think you'll find that (on x86 at least) the suggested >> preempt_disable_notrace() incurs no additional branches. >> >> Still, there is the next point to consider... >> >>> +#endif >>> rcu_read_lock(); >>> migrate_disable(); >> >> preempt_enable_notrace(); > > Do we want preempt_enable_notrace() after migrate_disable()? It feels a > little weird to me. > >> >>> if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) { >> >>> @@ -1863,9 +1892,23 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp) >>> preempt_enable(); >>> } >>> >>> +DEFINE_PER_CPU(struct perf_branch_snapshot, bpf_perf_branch_snapshot); >>> + >>> static __always_inline >>> void __bpf_trace_run(struct bpf_prog *prog, u64 *args) >>> { >>> +#ifdef CONFIG_PERF_EVENTS >>> + /* Calling migrate_disable costs two entries in the LBR. To save >>> + * some entries, we call perf_snapshot_branch_stack before >>> + * migrate_disable to save some entries. This is OK because we >>> + * care about the branch trace before entering the BPF program. >>> + * If migrate happens exactly here, there isn't much we can do to >>> + * preserve the data. >>> + */ >>> + if (prog->call_get_branch) >>> + static_call(perf_snapshot_branch_stack)( >>> + this_cpu_ptr(&bpf_perf_branch_snapshot)); >>> +#endif >>> cant_sleep(); >> >> In the face of ^^^^^^ the comment makes no sense. Still, what are the >> nesting rules for __bpf_trace_run() and __bpf_prog_enter() ? I'm >> thinking the trace one can nest inside an occurence of prog, at which >> point you have pieces. > > I think broken LBR records is something we cannot really avoid in case > of nesting. OTOH, these should be rare cases and will not hurt the results > in most the use cases. > > I should probably tighten the rules in verifier to only apply it for > __bpf_prog_enter (where we have the primary use case). We can enable it > for other program types when there are other use cases. Update about some offline discussion with Alexei and Andrii. We are planning to move static_call(perf_snapshot_branch_stack) to inside the helper bpf_get_branch_snapshot. This change has a few benefit: 1. No need for extra check (prog->call_get_branch) before every program (even when the program doesn't use the helper). 2. No need to duplicate code of different BPF program hook. 3. BPF program always run with migrate_disable(), so it is not necessary to run add extra preempt_disable_notrace. It does flushes a few more LBR entries. But the result seems ok: ID: 0 from intel_pmu_lbr_disable_all+58 to intel_pmu_lbr_disable_all+93 ID: 1 from intel_pmu_lbr_disable_all+54 to intel_pmu_lbr_disable_all+58 ID: 2 from intel_pmu_snapshot_branch_stack+88 to intel_pmu_lbr_disable_all+0 ID: 3 from bpf_get_branch_snapshot+28 to intel_pmu_snapshot_branch_stack+0 ID: 4 from <bpf_tramepoline> to bpf_get_branch_snapshot+0 ID: 5 from <bpf_tramepoline> to <bpf_tramepoline> ID: 6 from __bpf_prog_enter+34 to <bpf_tramepoline> ID: 7 from migrate_disable+60 to __bpf_prog_enter+9 ID: 8 from __bpf_prog_enter+4 to migrate_disable+0 ID: 9 from __bpf_prog_enter+4 to __bpf_prog_enter+0 ID: 10 from bpf_fexit_loop_test1+22 to __bpf_prog_enter+0 ID: 11 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13 ID: 12 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13 ID: 13 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13 ID: 14 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13 ID: 15 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13 We can save more by inlining intel_pmu_lbr_disable_all(). But it is probably not necessary at the moment. Thanks, Song