On Mon, Aug 30, 2021 at 02:41:05PM -0700, Song Liu wrote: > @@ -564,6 +565,18 @@ static void notrace inc_misses_counter(struct bpf_prog *prog) > u64 notrace __bpf_prog_enter(struct bpf_prog *prog) > __acquires(RCU) > { preempt_disable_notrace(); > +#ifdef CONFIG_PERF_EVENTS > + /* Calling migrate_disable costs two entries in the LBR. To save > + * some entries, we call perf_snapshot_branch_stack before > + * migrate_disable to save some entries. This is OK because we > + * care about the branch trace before entering the BPF program. > + * If migrate happens exactly here, there isn't much we can do to > + * preserve the data. > + */ > + if (prog->call_get_branch) > + static_call(perf_snapshot_branch_stack)( > + this_cpu_ptr(&bpf_perf_branch_snapshot)); Here the comment is accurate, but if you recall the calling context requirements of perf_snapshot_branch_stack from the last patch, you'll see it requires you have at the very least preemption disabled, which you just violated. I think you'll find that (on x86 at least) the suggested preempt_disable_notrace() incurs no additional branches. Still, there is the next point to consider... > +#endif > rcu_read_lock(); > migrate_disable(); preempt_enable_notrace(); > if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) { > @@ -1863,9 +1892,23 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp) > preempt_enable(); > } > > +DEFINE_PER_CPU(struct perf_branch_snapshot, bpf_perf_branch_snapshot); > + > static __always_inline > void __bpf_trace_run(struct bpf_prog *prog, u64 *args) > { > +#ifdef CONFIG_PERF_EVENTS > + /* Calling migrate_disable costs two entries in the LBR. To save > + * some entries, we call perf_snapshot_branch_stack before > + * migrate_disable to save some entries. This is OK because we > + * care about the branch trace before entering the BPF program. > + * If migrate happens exactly here, there isn't much we can do to > + * preserve the data. > + */ > + if (prog->call_get_branch) > + static_call(perf_snapshot_branch_stack)( > + this_cpu_ptr(&bpf_perf_branch_snapshot)); > +#endif > cant_sleep(); In the face of ^^^^^^ the comment makes no sense. Still, what are the nesting rules for __bpf_trace_run() and __bpf_prog_enter() ? I'm thinking the trace one can nest inside an occurence of prog, at which point you have pieces. > rcu_read_lock(); > (void) bpf_prog_run(prog, args);